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1.  Introduction 


The  idea  of  this  system  is  to  use  idle  processors  to  increase  the  speed  of  NQTHM.  There  were  many 
design  considerations,  including: 

•^0  shared  file  system.  We  want  to  allow  the  dispatcher  to  use  processors  besides  the  ones 
connected  to  our  file^stem.  This  also  allows  us  to  implement  a  distributed  system  that  has 
a  chance  to  recover  from  any  problems  that  occur. 

•  %lasy  to  kOl  jobs  on  a  given  host  Ultimately,  if  all  else  fails,  we  want  a  way  to  kill  ALL 
processes  we  have  created.  This  also  gives  machine  users  some  measure  of  control. 

^asy  to  prevent  job  assignment  on  a  given  host.  Machine  users  must  be  able  to  remove 
their  processor  from  the  shared  pool  if  they  wish.  ^ 

•Robust.  The  system  should  be  able  to  recover  from  most  kinds  of  processor  or  network 
,  probleirlS. 

•  Nice  user  interface.  Plenty  of  flexibility  with  reasonable  defaults,  output  that  tells  the  user 
what  he^eeds  tqj|g>6w,  etc.  ^ 

We  wish  to  usejhe  system  to  process  an  NQTHM  event  list  in  parallel,  with  a  given  *granularity"r  If4vtf 
,  have  a  granularity  of  n,  then  the  first  job  will  process  the  first  n  PROVE-LEMMAs.  Each  subsequent  job 
will  process  n  PROVE-LEMMAs.  Since  an  arbitrary  subsequence  of  PROVE-LEMMAs  may  require  a 
previous  definition,  each  job  will  process  all  previous  definitions.  A  subsequence  of  PROVE-LEMMAs 
may  also  require  PROVE-LEMMAs  from  a  previous  subsequence,  so  each  job  wUl  process  all  PROVE- 
LEMMAs  from  previous  subsequences,  treating  each  PROVE-LEMMA  as  an  ADD- AXIOM. 

This  system  allows  the  user  to  run  a  job  on  multiple  machines  in  parallel.  The  issue  of  machine  use  has 
been  discussed  at  Computational  Logic  and  a  preliminary  policy  adopted.  It  is  the  policy  at 
Computational  Logic  to  allow  users  to  run  parallel  jobs  using  the  dispatcher.  Spawned  jobs  are  to  be  run 
at  the  default  (nice)  priority  level,  and  it  is  OK  for  any  user  to  *1cick"^spawne<!  parallel  jobs  off  hiS;  local 
machine^if  he  wishes  (without  feeling  bad  about  it!) 

— - '  ,  ,  •  " 

Most  users  will  only  need  or  want  to  read  Subsection  2.1. 

Section  2  is  a  User’s  Manual  for  the  system.  Section  2.1  describes  basic  use  of  the  system  and  includes 
everything  most  people  will  need  to  use  the  system.  Section  2.2  describes  options  to  the  system.  Section 
2.3  describes  some  hooks  that  allow  customization  of  the  system.  Section  2.4  describes  the  use  of  the 
dispatcher,  the  part  of  the  system  that  distributes  the  work  to  other  processors  that  may  be  used  on  work 
other  than  NQTHM  jobs. 

Section  3  is  a  System  Guide.  Section  3.1  describes  how  the  dispatcher  is  implemented.  Section  3.2 
describes  how  events  files  are  broken  up  and  how  the  dispatcher  is  applied  to  this  problem. 

.Section  4  includes  results  and  conclusions.  Section  4.1  compares  the  system’s  performance  against 
sequential  runs.  Section  4.2  suggests  future  work.  Section  4.3  summarizes  our  conclusions. 

Appendix  A  describes  the  instrumentation  added  to  the  system  to  keep  track  of  its  use.  Appendix  B 
describes  an  experiment  where  the  dispatcher  is  used  to  compile  code  in  parallel.  Appendix  C  is  the  code. 


Acknowledgements;  We’d  like  to  thank  Bob  Boyer  and  Ross  Overbeek  for  their  help.  Ross  wrote  a 
program  that  gave  us  the  idea  to  build  this  system.  Bob  created  the  "nuke-all"  shell  script  for  us  and 
suggested  the  basic  paradigms  for  processing  nqihm  events  and  for  compiling  nqthm  in  parallel.  Both 
provided  advice  and  encouragement  throughout  the  project. 
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2.  A  User’s  Manual 


2.1  Basic  Use 


2.1-A  An  example 

The  following  example  shows  how  the  function  DO-FILE-PARALLEL  may  be  called  to  run  NQTHM 
events  in  parallel.  Comments  are  inserted  in  italics.  Later  subsections  explain  which  directories  and  files 
need  to  be  present  for  this  function  to  execute  successfully,  and  some  commands  that  control  parallel  jobs. 

oli:wlldi.ng[46]  %  akcl  {start  up  Lisp} 

AKCL  (Austin  Kyoto  Coomon  Lisp)  Version (1 . 57)  Thu  Sep  29  21:27:15  CDT  1988 
Contslns  Snhsncements  by  W.  Schelter 

>(losd  "/usr/locsl/src/psrsllel/top.lsp")  {Load  in  the  parallel NQ^THM -manipulation  code) 

Loading  /usr/locsl/src/psrsllel/top.lsp 
Loading  /usr/locsl/src/psrsllal/from-nqtbm. o 

start  address  -T  191000  Finished  loading  /usr/loeal/sre/parallel/fr^w-nqthni. o 
Finished  loading  /usr/local/src/parallel/top.lsp 
T 

>  (load-dispatcher)  {Load  in  parallel  NQTHM  code) 

Loading  /I ocal /src/parallal/dl spatch . o 

start  address  -T  lc8800  Finished  loading  /local/src/parallel/dlspatch. o 
Iioadlng  /local/src/parallel/bm.o 

start  address  -T  ld0800  Finished  loading  /local/src/parallel/bin.  o 
MIL 

> (do-file-parallel  "/usr/home/kaufmann/desto-pemnitationp. events"  2) 

{Execute  events  in  file  with  2  PROVE-LEMMA  events  per  Job) 

Clearing  directory  jobs/. 

Creating  jobs  files...  done. 

Invoking  dispatcher. 

Initializing  job  information...  done. 

Clearing  directory  output/. 

Clearing  directory  tamp/. 

Hosts  requested:  ("cllentl2*'  "cll"  "anderson"  "jingles"  "scarab" 

"decaf"  "cliantl3"  "elgin"  "oscar") . 

Hosts  currently  blocked:  NOME. 

starting»  cliantl2  :  DELETE  (job#  1) 
starting»  cli  :  DELBTE-COMMUTATXVITY  (job#  2) 
starting»  anderson  :  FERMUTATIONF -PRESERVES-MEMBER  (job#  3) 
startlng»  jingles  :  PERMUTATIONP -TRANSITIVE  (job#  4) 
cosq>leted»  anderson  :  PERMUTATIOMP -PRESERVES-MEMBER  (job#  3) 
coapleted»  cllontl2  :  DELETE  (job#  1) 
oon5>leted»  oil  :  DELETE-COMMUTATIVITY  (job#  2) 
coiq>leted»  jingles  :  PERMUTATIONP-TRANSITIVE  (job#  4) 

All  events  have  been  run  successfully. 

T 

> 


0:00 

0:00 

0:00 

0:00 

0:01 

0:01 

0:01 

0:01 
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2.1-B  Environmental  requirements 


joba/ 

tan^/ 

output/ 

hosts  (optional) 

<*v*ntl> 

finish. 1 

<av«ntl> . output 

<*vant2> 

finish. 2 

<av*ntl> . status 

output . 1 
output . 2 

status . 1 
status . 2 

<av«nt2> . output 
<avent2> . status 

The  system  uses  files  in  three  subdirectories  of  the  current  working  directory.  By  default,  these 
subdirectories  are  called  jobs/,  temp/,  and  output/.  A  user  may  also  optionally  have  a  file  named  hosts  in 
the  current  working  directory. 

The  jobs/  subdirectory  contains  the  input  file  associated  with  each  of  the  processes  to  be  executed.  It  is 
created  automatically  from  the  input  events  file  and  the  granularity  (the  number  of  PROVE-LEMMAs 
requested  per  job).  TTie  jobs  are  assigned  to  processors  in  alphabetical  order.  The  name  of  each  job  is  the 
name  of  the  first  PROVE-LEMMA  in  the  file.' 

cll:Mlldlngt47] %  la  jobs 

DELETE-COMMUTATIVITY  FERMUTATXONP-PRESESVES-MEMBER 

MEMBER-AFPEHD  PESMUTATIONP-TRANSITIVB 


The  temp/  subdirectory  contains  files  used  temporarily  during  the  execution  of  the  jobs.  Each  scheduled 
job  is  assigned  a  unique  number.  There  are  three  files  associated  with  each  scheduled  job  contained  in  the 
temp/  subdirectory.  The  outpuLn  file  is  the  output  created  by  job  n.  The  status.n  file  communicates 
whether  the  nqthm  events  were  successfully  (or  unsuccessfully)  processed  to  completion,  or  whether  the 
job  was  terminated  before  it  could  complete.  The  finish.n  file  is  created  when  the  remote  job  finishes. 

By  default,  when  the  finish.n  file  is  created  the  output.n  and  status.n  files  are  moved  to  the  output/ 
subdirectory  described  below.  Note  that  if  the  system  does  not  have  to  recover  from  errors,  there  will  be 
as  many  scheduled  jobs  as  there  are  files  in  the  job^  directory. 


cli:wilding[48]4 

1  Is  tainp 

finish. 1 

finish. 4 

output . 2 

status . 1 

status . 4 

finish. 2 

junk. lap 

output . 3 

status. 2 

finish. 3 

output . 1 

output . 4 

status . 3 

The  output/  subdirectory  contains  the  output  from  jobs,  and  information  about  the  status  of  each  job.  If 
scheduled  job  n  executes  job  X  from  the  jobs/  subdirectory  without  error,  then  X.ouqiut  in  the  output/ 
directory  is  identical  to  outputn  in  the  temp/  directory  and  X.status  in  the  ouqiut/  directory  is  identical  to 
status.n  in  the  temp/  directory. 


*If  Uie  file  contains  BOOT-STRAPs  or  NOTE-LIBs,  then  job  naming  and  assigning  is  slightly  different  Events  of  that  type  are 
always  the  first  events  in  a  job,  so  the  previous  job  may  have  fewa  than  PROVE-LEMMAs  than  the  granularity.  Job  names  are  the 
first  event  name  in  the  job  foUowed  by  a  followed  by  a  number  that  is  the  location  in  the  file  of  the  most  recent  BOOT-STRAP 
or  NOTE-LIB.  This  is  necessary  since  an  events  file  with  BOOT-STRAPs  or  NOTE-LIBs  may  have  duplicate  event  names. 
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cll:wildlng[49] %  !■  output 
DEIAIB-COMMOTATIVITY .  output 
DBUITB-COMHOTATIVITY. status 
MEMBER-APPEND . output 
MEMBER-APPEND. status 


PERMUTATIONP -PRESERVES-MEMBER .  output 
PERMDTATIONP -PRESERVES-MEMBER,  status 
PERMDTATIONP-TRANSITIVE .  output 
PERMUTATIONP-TRANSITIVE .  status 


If  ihe  current  directory  contains  a  File  named  (by  default)  "hosts",  then  it  is  used  to  fmd  hosts  upon  which 
to  run.  Each  host  name  must  appear  on  a  separate  line.  The  number  of  times  a  hostname  appears  in  the 
host  file  will  be  the  number  of  processes  that  may  be  simultaneously  placed  on  it  (For  example,  if  "cli" 
appears  twice  in  the  host  file,  then  the  host  cli  may  have  two  simultaneous  processes  running  on  it.) 

If  a  hosts  file  does  not  exist  in  the  local  directory,  a  system  default  is  used. 

cll;wlldlng[SO] %  cat  boats 

clientia 

clt 

•ndarson 

jingles 

scarab 

decaf 

clientia 

elgln 

Oscar 


2.1-C  Problems  That  are  Detected 

There  are  two  kinds  of  errors  that  are  detected  by  the  system.  First,  if  a  job  fmishes  but  does  not  produce 
the  appropriate  output,  then  the  system  concludes  that  the  remote  job  ended  abnormally  and  reschedules 
the  work.  (To  construct  the  following  example,  one  remote  process  was  killed.  The  system  detected  the 
problem  and  rescheduled  the  job.) 

>  (do-flla-parallal  "/usr/bome/kauCinann/demo-parniutatlonp. events"  3 
:klll-if-no-progras8  60) 

Clearing  directory  jobs/. 

Creating  jobs  files . . .  dona . 

Invoking  dispatcher. 

Initializing  job  inforaatlon. . .  done. 

Clearing  directory  output/ . 

Clearing  directory  temp/. 

Hosts  requested:  ("cllentl2"  "cli”  "anderson”  "jingles"  "scarab" 

"decaf"  "clientl3"  "elgln"  "oscar") . 

Hosts  currently  blocked:  NONE. 


startlng»  cllentl2  :  DEI£TE  (job#  1)  0:00 
startlng»  cli  :  MEKBER-DELETE-OTHER  (job#  2)  0:00 
starting»  anderson  :  PERMUTATIONP-TRANSITIVE  (job#  3)  0:00 
***N0T  coiig>lated»  anderson  :  PERMUTATIONP-TRANSITIVE  (job#  3)  0:00 
starting»  anderson  :  PERMUTATIONP-TRANSITIVE  (job#  4)  0:00 
coiiipleted»  clientl2  :  DELETE  (job#  1)  0:01 
complated»  cli  :  MEMBER-EELETE-OTHER  (job#  2)  0:01 
coniplated»  anderson  :  PERMUTATIONP-TRANSITIVE  (job#  4)  0:01 
All  events  have  been  run  successfully. 

T 


> 

The  other  type  of  error  detected  by  the  system  is  when  no  progress  is  made.  If  the  output  file  is  not 
written  to  for  too  long,  the  system  kills  the  local  process  (in  case  it  is  not  already  dead)  and  restarts  the 
job.  (To  construct  this  example,  all  local  processes  related  to  parallel  jobs  were  killed  on  the  local  host. 
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After  a  while  the  system  detected  the  problem  and  todc  action.) 

> (do-f lla-p«rallal  "/uar/home/lcaufinann/daino-panmitatlonp.avents"  3 
:]clll-i£-no-pro9rass  60) 

Claarlng  dlractory  joba/. 

Creating  jobs  files . . .  done . 

Invoking  dispatcher. 

Initializing  job  information...  done. 

Clearing  directory  output/. 

Clearing  directory  temp/. 

Hosts  requested:  ("clientl2"  "cli"  "anderson”  "jingles"  "scarab" 
"decaf"  "clientl3"  "elgin"  "oscar") . 

Hosts  currently  blocked:  HONE. 


startlng»  cllantl2  :  OEI£TB  (job#  1)  0:00 
starting»  cli  :  MBMBER-OEI£TE-OIBER  (job#  2)  0:00 
otarting»  anderson  :  PERMDTATIOHB -TRANSITIVE  (job#  3)  0:00 
***  KILLED»  olientl2  :  DELETE  (job#  1)  0:02 
•**  KILLED»  cli  :  MEMBER-DELETE-OTHER  (job#  2)  0:02 
•**  K1LLED»  anderson  :  PERMOTATIONP-TRANSITIVE  (job#  3)  0:02 
starting»  clientl2  :  DELETE  (job#  4)  0:02 
startlng»  cli  :  MEMBER-DELETE-OTHER  (job#  5}  0:02 
startlng»  anderson  :  PERMDTATIONP-TRANSITIVE  (job#  6)  0:02 
canq>leted»  cllentl2  :  DELETE  (job#  4)  0:03 
cosg>lated»  cli  :  MEMBER-DELETE-OTHER  (job*  5)  0:03 
coiigileted»  anderson  :  PERMDTATIONP -TRANSITIVE  (job#  6)  0:03 
All  events  have  been  run  successfully. 

T 


> 

2.1 -D  The  Par  commands 

There  are  several  commands  that  are  useful  for  controlling  parallel  jobs.  They  are  used  from  the  shell, 
and  none  of  them  has  arguments.  Note  that  it  is  the  policy  at  Computational  Logic  that  it  is  OK  for  any 
user  to  do  execute  any  of  these  commands  whenever  he  wishes. 

•  parstop  -  block  the  local  machine  from  getting  more  par  jobs  assigned  to  it  (blocks  are 
remov^  at  8PM  and  SAM  every  day)  and  kill  all  par  jobs  already  running  on  this  machine.^ 

•  parblock  ~  block  the  local  machine  from  getting  more  par  jobs  assigned  to  it  (blocks  are 
removed  at  8PM  and  SAM  every  day) 

•  parunblock  --  free  the  local  machine  for  par  jobs 

•  parshowblocked  -  show  which  machines  are  currently  blocked 

•  parkin  -  kill  all  par  jobs  on  this  machine 

•  parcount  --  givp  the  number  of  jobs  owned  by  user  par  on  the  local  processor  (this  is 
usually  twice  the  number  of  rsh’s  on  the  local  machine) 

Here’s  an  example  of  these  commands  in  action: 


^This  cofiunand  is  simply  a  parblock  followed  by  a  paikill 


5 


cliMitl2 :  wilding  [11  ]  %  parshowblockad 
jinglaa 

cllantl2 : wilding [12] %  parstop 

cllantl2  blockad 

All  PAR  jobs  killed  on  cllantl2 

clientl2:wlldlng[131 %  parshowblocked 

cllentl2  jingles 

clientl2:wildlng[14] %  parunblock 

cllentl2  unblocked 

cllentl2 : wilding [IS ] %  parcount 

3  PAR  jobs  executing 

cllentl2:wilding[16]  %  parklll  kilt  the  current  jobs  but  allow  later  ones 

cllentl2:wlldlng[17]%  parcount 
0  PAR  jobs  executing 
cllentl2 :wlldlng[18] %  parcount 
2  PAR  jobs  executing 

cllentl2:wlldlng[19]  %  parstop  kill  the  current  jobs  and  prohibit  later  ones 

clientl2  blocked 

All  PAR  jobs  killed  on  cllentl2 

cllentl2:wildlng[20] %  parcount 

0  PAR  jobs  executing 

cllentl2 : wilding [21] « 


« 
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12  DO-FILE-PARALLEL  options 


There  are  several  options  available  when  using  DO-FILE-PARALLEL  to  run  NQTHM  jobs  in  parallel. 
They  are  passed  as  key  parameters  in  the  function  invocation.  The  general  form  of  an  invocation  of 
DO-FILE-PARALLEL  is: 

infila  granularity 

:  joba-dlractory-name  <diractory-nania> 

:  output-dlractory-nama  <<llractory-naiM> 
thoata-flla-nama  <flla-naina> 

:loaal-boat-firat  <t  |  nil> 

:klll-lf-no-prograas  <nusibar  of  aaconda> 

:  conmand-nama  <coimand-na]na> 

:  f ront-and  <flla-naiae> 

:dalay  <nil  |  numbar  of  aaconda> 

:nlca-flag  <t  |  nll>  ) 


22-k  rjobs-directory-name 
Default:  "jobs/" 

The  jobs  directory  name  is  a  string  that  contains  the  name  of  the  subdirectory  that  is  to  be  used  to  hold  the 
input  files  for  the  dispatcher.  If  the  parameter  given  is  not  a  real  subdirectory  name  for  the  current 
directory,  or  if  the  parameter  does  not  end  with  a  7’  character,  then  an  error  is  reported.  The  job- 
directory-name  subdirectory  is  cleared  with  every  run  of  DO-FILE-PARALLEL. 

2.2- B  :output-directory-name 
Default:  "output/” 

The  output  directory  name  is  a  string  that  contains  the  name  of  the  subdirectory  that  is  to  be  used  to  hold 
the  output  files  from  the  jobs.  If  the  parameter  given  is  not  a  real  subdirectory  name  for  the  current 
directory,  or  if  the  parameter  does  not  end  with  a  7’  character,  then  an  error  is  reported.  The  output- 
directory-name  subdirectory  is  cleared  with  every  run  of  DO-FILE-PARALLEL. 

2.2- C  :hosia-rile-name 

Default:  "hosts" 

The  hosts  file  name  is  a  string  that  contains  the  name  of  a  file  that  contains  the  hosts  upon  which  to  run 
the  parallel  jobs.  As  described  in  section  2,  if  the  host  file  name  exists  (either  the  default  "hosts"  in  the 
current  directory  or  the  file  specified  with  the  :hosts-fiIe-namc  key  parameter)  then  it  is  used  to  provide 
the  list  of  host  names.  If  the  host  file  name  does  not  exist,  than  a  system-wide  default  is  used  instead. 

The  hosts  file  contains  one  host  per  line.  Each  host  name  should  be  a  valid  host  accessible  to  the  user 
with  an  rsh  command  from  the  local  host.  A  host  may  have  as  many  processes  assigned  to  it  as  there  are 
occurrences  of  its  name  in  the  hosts  file. 

2.2- D  rlocal-host-first 

Default:  t 

A  list  of  hosts  is  maintained  by  the  system  for  use  in  assigning  jobs  when  necessary.  Since  jobs  are 


7 


assigned  to  the  hosts  on  this  list  starting  from  the  beginning,  the  order  of  the  hosts  in  the  list  affects  the 
job.  This  is  particularly  important  when  one  considers  system  errors  -  a  job  that  does  not  complete 
successfully  is  reassigned  to  the  frrst  host  on  the  list  that  does  not  have  a  job  assigned  to  it,  even  if  that 
host  was  the  one  that  just  failed!  If  the  first  host  is  inaccessible  for  some  reason,  the  system  will  loop 
forever  reassigning  a  job  to  that  host. 

Since  we  are  already  relying  on  the  local  host,  in  general  we  want  the  local  host  to  appear  frrst  in  the  host 
list.  If  :Iocal-host-frrst  is  non-nil,  then  the  host  list  lists  the  hosts  in  the  same  order  as  was  found  in  the 
hosts  file,  except  that  occurrences  of  the  local  host  are  moved  to  the  front  of  the  list  If  .iocal-host-frrst  is 
nil,  then  the  host  list  is  in  the  same  order  as  was  found  in  the  hosts  file. 

22-E  :kill-if-no-progress 
Default;  200 

As  described  in  section  2,  remote  jobs  that  do  not  produce  output  for  too  long  are  killed  and  their  work 
rescheduled.  The  minimum  "safe"  time  in  seconds  for  before  a  job  may  be  killed  is  given  with  the 
parameter  :kill-if-no-progress. 

If  the  value  of  this  parameter  is  n,  then  the  processor  checks  every  n  seconds  to  see  if  any  processes 
should  be  killed.  If  a  process  has  not  written  to  its  output  file  in  the  last  n  seconds,  then  it  is  killed  and  the 
work  rescheduled.^ 

The  garbage  collection  message  is  turned  on  in  remote  jobs  (in  the  default  front-end  file  -  see  :  front-end 
below)  when  they  are  set  up  to  guarantee  that  long  periods  of  time  between  output  updates  really  means 
trouble  and  is  not  just  NQTHM  taking  a  long  time. 

22-F  :cointnand-name 
Default; '  pc-nqthm" 

The  command  name  is  the  command  that  is  to  be  run  on  the  remote  hosts.  It  must  be  accessible  on  the 
remote  host. 

Probab'y  the  only  time  a  user  would  want  to  change  the  default  command  is  to  use  another  version  of  the 
theorem  prover,  for  example  "nqthm". 

2.2-G  :front-end 

Default;  "/local/src^arallel/front-end.lsp" 

The  front-end  is  a  string  that  is  the  name  of  a  file  containing  forms  to  be  sent  to  the  remote  processes 
before  the  job-specifre  info  and  the  events.  The  default  contains  code  that  helps  set  up  the  remote  process 
for  running  the  subsequence  of  events  that  need  to  be  run  by  that  job. 

Most  users  will  not  want  to  mess  with  this  parameter. 


^Thii  means  that  a  process  may  take  longer  than  n  seconds  to  be  lolled  once  it  stops  producing  output,  but  no  longer  than  2*n 
seconds. 
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2J-H  :delay 

Default:  nil 

The  pan-...eter  describes  how  often  the  local  host  will  check  to  sec  if  there  are  jobs  that  have  completed. 
If  the  number  n  is  provided,  then  the  local  host  will  sleep  n  seconds  between  checking  for  completions. 

If  nil  is  provided  (or  the  default  is  used)  then  the  delay  will  be  set  equal  to  the  granularity.  Thus,  if  each 
job  jHtxesses  20  PROVE-LEMMAS  and  the  delay  is  set  to  nil,  then  the  local  processor  will  sleep  20 
seconds  after  checking  for  completed  jobs. 

2.2-1  :nice-flag 

Default:  t 

If  nice-flag  is  non-nil,  then  the  jobs  run  on  remote  machines  will  be  run  using  "nice".  That  is,  they  will 
run  so  that  they  have  a  lower  priority  than  most  other  jobs  on  the  system.  If  nice-flag  is  nil,  then  the 
spawned  jobs  will  be  run  at  normal  priority. 

Note  that  even  nice  jobs  take  resources,  so  running  at  a  lower  priority  will  not  guarantee  that  running  a 
parallel  job  will  have  no  effect  on  the  systems  used. 

It  is  the  policy  at  Computational  Logic  to  run  jobs  at  the  default  (nice)  priority  level. 
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23  Some  Implementation  Hooks 


Ims  section  describes  global  variables  whose  values  are  used  by  the  system.  Using  them,  the  user  may 
customize  the  system  somewhat. 

The  following  variables  may  be  set  by  the  user. 

•  •output-completed-string*  (Default:  "Boyer-Moore  job  terminated")  This  string  appears  in 
the  status  file  of  a  job.  It  signifies  that  the  job  completed.  WheAer  the  job  completed 
successfully  or  with  failure  is  communicated  on  the  line  after  this  Une. 

•  *no-io-in-parallel-nag*  (Default:  nil)  This  flag  directs  whether  the  remote  processes  should 
produce  the  complete  NQTHM  output  or  an  abbreviated  version.  If  non-nil  the  abbreviated 
version  is  produced. 

•  ‘delete-jobs-flag*  (Default:  nil)  This  flag  directs  whether  the  job  input  files  should  be 
deleted  after  they  are  all  processed.  If  nil  the  job  input  files  are  retained. 

•  •clear-query-flag*  (Default:  nil)  This  flag  directs  whether  the  the  user  should  be  queried 
before  the  jobs,  temp,  and  output  subdirectories  are  cleared.  If  non-nil  the  query  is 
performed. 

•  *local-host-waming*  (Default  t)  This  flag  directs  whether  the  user  should  receive  a 
warning  if  the  first  host  in  the  host  list  is  not  his  local  host.  This  is  checked  after  the  host  list 
is  possibly  rearranged  to  move  the  local  host  to  the  front  of  the  host  list  (as  described  in 
section  2.2).  As  noted  before,  if  the  first  host  is  inaccessible  for  some  reason,  the  system 
will  loop  forever  reassigning  a  job  to  that  host 

•  *save-temp-riles-flg*  (Default  nil)  This  flag  directs  whether  the  files  in  the  temp/ 
subdirectory  should  be  copied  or  moved  to  the  output  directory  when  a  job  completes.  If 
non-nil,  the  temp  files  are  copied  (and  therefore  sav^). 

•  *kiU-dispatcher-upon-seeing-failure*  (Default  nil)  This  flag  directs  whether  the  parallel 
job  should  finish  as  soon  as  failure  is  detected.  If  non-nil,  the  job  stops  as  soon  as  any  of  the 
jobs  returns  a  failure. 

•  ♦system-parallel-directory"'  (Default  "/local/src/parallel/”)  This  directory  name  contains 
various  files  the  dispatcher  needs  to  operate.  These  files  are  discussed  in  Section  3.1. 

•  ^parkill-command*  (Default  "/local/bin/parkill")  This  siring  is  submitted  to  the  system  if 
the  system  ends  abnormally  (e.g.  the  user  aborts) 

•  ♦protected-hosts-subdirectory*  (Default  "protected-hosts/")  This  subdirectory  of  *system- 
parallel-directory*  contains  the  "block"  files  used  by  the  dispatcher.  (See  Section  3.1.) 

•  *lock-file-name*  (Default  "lock-out-others.par")  This  is  the  name  of  the  lock  file  that  is 
used  to  lock  the  current  directory  from  use  as  a  parallel  job  current  directory  for  someone 
else. 

•  *output-info-subdirectory*  (Default  "statistics/runs/")  This  subdirectory  of  *system- 
parallel-directory*  holds  the  files  that  record  job  progress.  These  files  are  designed  to  help 
track  system  use.  (See  Appendix  A.) 

•  ^all-valid-hosts-names’"  (Default  nil)  If  non-nil,  host  names  from  the  hosts  file  are  checked 
to  make  sure  that  they  appear  in  this  list  If  they  do  not  appear,  a  warning  message  appears. 
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2.4  Dispatcher  Use 


This  section  is  intended  for  someone  who  wishes  to  use  the  dispatcher  part  of  this  system  indq)endently. 
We  have  tried  to  keep  the  dispatcher  functionally  separate  in  order  to  make  it  applicable  to  other  problems 
than  parallel  NQTHM.  If  you  just  want  to  use  this  system  to  run  NQTHM  jobs  in  parallel,  this  subsection 
probably  will  NOT  do  you  any  good. 

DON’T  LET  THIS  SECTION  CONFUSE  YOU  IF  YOU  SIMPLY  WANT  TO  USE  THIS  SYSTEM  TO 
RUN  NQTHM  JOBS  IN  PARALLEL.  It  is  intended  for  those  who  want  to  build  systems  to  do  parallel 
work  with  other  kinds  of  jobs. 

Section  3.1  describes  the  dispatcher’s  implementation,  and  section  3.2  describes  how  the  system  is  built 
on  top  of  the  dispatcher.  Appendix  B  contains  an  example  of  using  the  dispatcher  to  compile  in  parallel. 

Dispatcher  Use  Overview 

The  dispatcher  takes  work  and  passes  that  work  out  to  some  processors.  When  a  processor  is  done  with  a 
task,  the  dispatcher  updates  its  records  and  assigns  a  new  job  to  the  processor.  Eventually  all  the  work 
will  be  completed  and  the  dispatcher  will  return  with  a  value  reflecting  whether  all  the  jobs  were 
successful. 

The  dispatcher  expects  there  to  be  in  the  current  directory  a  jobs  directory  (default  name  :  "jobs/")  that 
contains  the  work  to  be  done.  By  default,  each  file  in  the  jobs  directory  contains  the  input  to  the 
command  to  be  run  remotely. 

During  the  execution  of  the  dispatcher,  a  temp  subdirectory  (name:  "temp/")  is  used  to  keep  track  of  the 
jobs  as  they  progress. 

After  a  job  completes,  its  standard  output  is  moved  from  the  temp/  subdirectory  to  the  the  output 
subdirectory  (default  name  "output/").  If  the  name  of  the  job  file  is  X,  then  X.output  will  contain  the 
task’s  output  and  will  appear  in  the  ou^ut  directory.  X.status  will  contain  the  standard  errw  stream 
output  by  the  job  and  will  also  appear  in  the  ouqmt  directory. 

The  hosts  to  use  for  remote  processing  can  be  found  in  the  hosts  file  (default  name  :  file  "hosts"  in  the 
current  directory). 

Dispatcher  Use  Example 

cli: wilding [12]%  !■  jobs 

job!  job2  jobs  job4  JobS 

cll : wilding [13] %  mora  joba/job2  • 

hoatnanw;  data;  rah  cll  data 

cll :wlldlng[14] %  mora  hoata 

andaraon 

cllentl2 

oacar 

oil 

cll: wilding [15]%  aJccl 

AKCL  (Austin  Kyoto  Common  Lisp)  Version (1 . 57)  Thu  Sap  29  21:27:15  CDT  1988 
Contains  Enbancaments  by  H.  Schaltar 

> (load  "/local/src/parallal/from-nqthm.lap") 

Loading  /local/sr.c/parallal/from-nqtbm.  lap 
Finished  loading  /local/arc/parallel/From-ngthm. lap 

r 
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> (lo4d  "/loc«l/src/parall«l/dlap4tch'') 

Loading  /local/arc/paxallal/dlspatcb.o 

start  addrass  -X  IcSSOO  Flnlabod  loading  /local/src/parallel/dlspatch. o 
25392 

>(dispatchar  : coamand-naoM  "ah"  :cong>latlon-functlon  #'  (lanibda  (x) 

(cona  'success  nil}) 

: front-end  :baclc-end  "") 

Hosts  reguested:  ("ell"  "anderson"  "cllentl2"  "oscar") . 

Hosts  currently  blocked;  NONE. 


starting»  cli  ;  jobl  (job#  1)  0:00 
startlng»  anderson  :  job2  (job#  2)  0:00 
starting»  cllentl2  :  job3  (job#  3)  0:00 
startlng»  oscar  :  job4  (job#  4)  0:00 
cospleted»  cll  :  jobl  (job#  1)  0:00 
cong>latad»  anderson  ;  job2  (job#  2)  0:00 
coapleted»  cliantl2  :  job3  (job#  3)  0:00 
coag>lated»  oscar  :  job4  (job#  4)  0:00 
startlng»  cll  :  jobS  (job#  5)  0:00 
coiipletsd»  cli  :  jobs  (job#  S)  0:01 
T 


>(by) 

Bye. 

oll:wildlng[16] %  Is  output 

jobl . output  job2 . output  job3 . output  job4 . output  jobS . output 

jobl. status  job2. status  job3. status  job4. status  jobS. status 

cll:Hlldlng(18]%  more  output/job2. output 

16538 

anderson 

Xue  Feb  21  12:47:45  CSX  1989 
Xue  Fab  21  12:56:29  CSX  1989 
cll:wlldlngC19]4 


Dispatcher  Invocation 


A  dispatcher  invocation  has  the  form 


(dispatcher  :  jobs-directory-naaia  <directory-nasie> 

:  output-dlrectory-nains  <dlrectory-na>iie> 
:hoata-£lla-nasie  <eila-naine> 

: local-host-first  <t  |  nll> 

:  delay  <nuiaber> 

;kill-if-no-progress  <nuniber> 

:  coamand-nasM  <coiBmand-nama> 
:coapletlon-functlon  <functlon> 

:  front-end  <flle-nas)a> 

:  back-end  <flla-nains> 

:nlca-flag  <t  |  nll>) 

: job-list  < job-list  |  nll>  ) 


default 

"jobs/" 

"output/" 

"hosts" 

t 

15 

600 

"pc-nq[thm" 

# ’ ngthm-coo^letad 

"/local/src/parallel/front-end.lsp" 
"/local/src /parallel /back-end . 1 sp " 
t 

nil 


Most  of  these  parameters  are  the  same  as  those  to  DO-FILE-PARALLEL  (see  section  2.2),  with  the 
following  exceptions; 

•  ihost-file-name  Like  the  :host-rUe-name  parameter  name  to  DO-FTLE-P/UiALLEL  except 
if  the  file  does  not  exist  an  error  occurs. 

•  rcompletion-function  This  is  a  function  to  be  applied  to  the  status  file  of  a  job  to  tell 
whether  it  completed  successfully  or  not  The  status  file  contains  the  output  from  the 
standard  error  stream  of  the  job.  This  function  will  be  applied  when  the  remote  process 
terminates.  It  is  expected  to  return  either  nil  (the  job  apparently  aborted)  or  a  pair  of  the 
form  (’success .  message)  or  (code .  message)  where  code  may  be  anything  except  ’success, 
and  message  may  either  be  nil  or  a  list  suitable  as  the  arglist  for  the  function  FORMAT.  If 
this  function  returns  nil  for  a  job,  that  job  is  rescheduled.  If  all  input  jobs  eventually 
produce  status  files  that  indicate  success  in  this  sense,  then  dispatcher  returns  t,  otherwise 
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the  dispatcher  returns  nil. 

•  :  front-end  This  file  is  sent  to  the  remote  process  as  input  before  the  job  input  file.  If  the 
empty  string  is  used,  then  no  front-end  file  is  sent 

•  :back-end  This  file  is  sent  to  the  remote  process  as  input  after  the  job  input  file.  If  the 
empty  string  is  used,  then  no  back-end  file  is  sent. 

•  :job-list  If  non-nil,  this  provides  a  list  of  job  names  in  the  jobs  subdirectory  to  be  run.  This 
may  be  useful  if  only  a  subset  of  the  jobs  subdirectory  is  to  be  used,  or  if  the  jobs  are  to  be 
assigned  in  a  particular  order.  If  nil,  all  jobs  in  the  jobs  directory  will  be  assigned  in 
alphabetical  order. 
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3.  Systems  guide:  implementation 

This  section  contains  a  description  of  our  implementation.  Let  us  re-emphasize  that  it  should  be 
completely  unnecessary  to  read  this  section  if  one  simply  wants  to  be  able  to  use  the  system.  Rather,  we 
have  included  this  section  for  those  who  would  like  to  know  how  this  all  works  at  the  lower  levels, 
perhaps  so  that  they  can  create  variants  of  this  system.  One  might  even  think  of  this  section  as 
documentation  for  the  code;  the  code  itself  may  be  found  in  Appendix  C. 

The  first  subsection  below  is  a  guide  to  the  implementation  of  the  dispatcher,  which  has  nothing  to  do 
with  NQTHM  but  is  a  general-purpose  program  for  running  independent  jobs  in  parallel.  (The 
dispatcho-’s  use  is  documented  in  Subsection  2.4.)  The  second  subsection  below  describes  the 
implementation  of  the  parallel  version  of  the  Boyer-Moore  prover  on  tc^  of  the  dispatcher.  We  conclude 
this  section  with  an  explanation  of  the  system  front-end  flle. 

3.1  Dispatcher  implementation 

The  main  function  for  the  general  dispatcher  is  the  Lisp  function  DISPATCHER.  The  code  and  its 
comments  (see  Appendix  C)  are  the  ultimate  reference.  In  this  section  we  describe  the  algorithm  it  uses 
and  some  of  the  subsidiary  functions. 

Dispatcher  algorithm: 

1.  Initialize  various  environmental  variables.  These  include  the  starting  time,  the  local  host 
name,  the  user  name,  and  the  suffrx  to  use  for  the  filename  where  statistics  of  the  run  will  be 
collected,  e.g,  the  ‘567489’  in  "/local/src^aralIel/statisticsfruns/Svilding.567489". 

2.  Set  up  locking.  Only  one  run  of  the  dispatcher  is  allowed  in  a  given  directory  at  a  given 
time,  in  order  to  avoid  clashing  use  of  common  directories.  The  file  "lock-out-others.par"  is 
created  in  the  current  working  directory  whenever  the  dispatcher  is  entered.  The  dispatcher 
starts  by  checking  to  see  if  the  directory  is  already  "locked"  in  this  sense;  if  not,  it  locks  the 
directory. 

3.  Check  if  jobs  exist.  If  a  job-list  is  provided,  make  sure  that  the  jobs  all  exist  in  the  jobs/ 
subdirectory. 

4.  Set  up  the  Jobs.  The  job  names  are  simply  the  file  names  fixxn  the  directory 
jobs-directoxy-naine  ("jobs/"  by  default),  unless  they  are  provided  by  the  :job-list 
option. 

5.  Set  up  the  initial  hosts- jobs-alist.  This  is  an  association  list  which  associates  jobs 
with  hosts.  Initially  each  host  is  associated  with  nil,  indicating  that  no  job  has  yet  been 
assigned. 

6.  Enter  main  loop. 

•  Update  completed  Jobs  records.  Remove  the  terminated  jobs  from  the 
hosts- jobs-alist.  Tack  those  that  didn’t  complete  back  on  to  the  end  of  the 
list  of  unassigned  jobs.  (More  on  this  below.) 

•  Possibly  look  for  and  kill  bombed  Jobs.  If  it  has  been  longer  than 
kill-if-no-progress  seconds  since  we  last  looked  for  "bombed  jobs",  then 
kill  all  the  jobs  which  haven’t  output  any  characters  in  the  last 
Icill-if-no-progress  seconds  and  put  them  back  on  the  list  of  unassigned 
jobs.  In  such  cases,  a  message  headed  with  "***  KILLED"  will  appear  on  the 
terminal. 

•  Assign  jobs.  Assign  jobs  to  hosts  which  are  currently  not  busy,  appropriately 
adjusting  the  hosts- joba-alist  and  the  list  of  unassigned  jobs.  Avoid  hosts 
that  are  currently  blocked  (except  for  the  local  host).  Print  an  appropriate  "starting" 
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message  to  the  terminal  for  each  new  job  started. 

•  Check  for  completion.  If  no  hosts  are  busy,  return  from  the  loop.  Otherwise  sleep 
for  delay  seconds. 

7.  Report  failed  jobs.  Return  MIL  if  there  are  any  failed  Jobs  and  otherwise  reuim  T. 

8.  Clean  up.  Ronove  the  lock  (i.e.  delete  the  file  "lock-out-others.par")  and  report  completion 
to  statistics  files.  If  execution  didn’t  complete  normally  ther  >  parkill  to  remove  all 
local  jobs  owned  by  user  par.’* 

One  complicated  thing  about  the  code  is  how  jobs  are  started.  The  Lisp  function  SYSTEM  (as  it  exists  in 
KCL  and  AKCL  at  CLInc)  takes  a  string  which  is  then  given  to  the  Shell  to  execute.  Our  function 
SYSTEM- JOB-COMMAMD  produces  a  String  that,  when  given  to  SYSTEM,  creates  a  job.  This  causes 
execution  of  the  Shell  command  parcsh,  which  calls  the  SheU  on  its  arguments  aft^  changing 
ownership  of  the  process  to  the  user  par.  The  argument  list  for  parcsh  is  of  the  form 

SXR  <hoBt-naoa>  <coinoiBnd-naiM>  <unlqu«-nuiBb«r>  <front-*nd>  <job>  <back-and> 


PAR  is  a  c-Shell  script  (see  Appendix  C)  that: 

1.  Write  the  process  number  to  the  file  temp/output.n,  where  n  is  the  <unique-number> 
supplied  above,  i.e.  the  unique  job  number. 

2.  Call  rah  (remote  Shell)  with  host  <host-name>  and  command  <command-name>,  piping 
the  concatenation  of  the  files  <ffont-end>,  <job>,  and  <back-end>  to  its  standard  input 
stream.  Send  the  standard  output  of  this  process  to  the  file  temp/outpuLn,  and  send  its  enor 
output  to  the  file  temp/status.n,  where  (as  above)  n  is  the  <unique-number>. 

3.  Create  the  file  temp/finish.n  (same  n  as  above). 

Notice  that  since  we  start  by  writing  the  process  number  to  the  file  temp/ouq)ut.n,  we  can  kill  a  bombed 
job  by  first  reading  its  process  id  from  the  first  line  of  the  output  file  and  then  issuing  the  approiHiate  kill 
command. 

As  mentioned  above,  we  need  to  update  completed  jobs  records.  We  have  just  explained  that  PAR  uses  a 
remote  Shell  call  (i.e.,  rsh)  to  fire  up  a  job  on  a  remote  machine,  after  which  it  creates  a  "finish"  file. 
One  may  consider  a  job  to  be  terminated  if  its  corresponding  "finish”  file  has  been  created.  (The  Lisp 
function  job-conf>leted,  which  should  perhaps  be  called  job-terminated,  does  this  check.) 
Such  a  "terminated"  job  is  to  be  removed  from  hosts-jobs-alist.  But  first  it  must  be  decided 
whether  the  job  completed  or  not;  if  not,  it  should  be  put  back  on  the  list  of  unassigned  jobs.  This 
determination  is  up  to  the  completion  Junction,  which  by  default  is  the  function  nqthm-coiipleted. 
Recall  from  Subsection  2.4  that  this  function  expects  a  file  name,  which  in  this  case  is  the  status  file 
temp/status.n  (where  n  is  the  job’s  unique  number),  and  returns  either  nil  (which  means  that  it  was 
"unable  to  give  a  reliable  answer")  or  a  pair  of  the  form  ('  success  .  message)  or  (code  . 
message) .  In  the  fcxmo'  case  (where  NIL  is  returned)  the  job  is  considered  to  have  failed  to  complete 
(and  the  message  "***NOT  completed"  is  printed  out),  and  it  is  put  on  the  list  of  unassigned  jobs. 
Otherwise  the  job  is  considered  to  have  completed  (and  the  message  "completed"  is  printed  out),  the 
message  (if  any)  is  printed  out,  and  the  ouqrut  and  status  files  are  moved  to  the  output/  subdirectory.  If 
the  first  component  of  this  pair  is  anything  other  than  '  success  then  the  job  is  added  to  the  list 
♦failed- job-names*.  When  the  dispatcher  finally  returns,  if  this  list  is  not  NIL  then  the  list  is 
printed  out  with  an  appropriate  message  and  the  dispatcher  returns  nil.  If  all  jobs  succeed,  then  the 
dispatcher  returns  T. 


^In  miny  casei  this  will  loll  remote  par  jobs  as  well. 
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22  Parallel  nqthm  implementation  on  top  of  the  dispatcher 

As  in  the  previous  subsection,  we  leave  to  the  code  documentation  the  task  of  giving  detailed 
qjecifications.  What  follows  here  is  an  ovwview  of  the  execution  of  the  main  function, 
DO-FILE-PARALLEL. 

1.  Set  up  locking.  This  works  just  as  it  did  in  the  dispatcher.  We  want  the  current  working 
directory  reserved  for  only  this  run  of  DO-FILE-PARALLEL  since  even  before  the 
dispatcher  is  called  we  will  be  writing  to  one  subdirectory,  namely  (by  default)  jobs/. 

2.  Check  that  appropriate  Hies  and  directories  exist.  These  include  the  system’s 
front-end  file,  the  jobs/  subdirectory,  the  ouqrut/  subdirectoy,  the  temp/  subdirectory, 
and  the  hosts  file  (or  whatever  the  user  supplied  in  place  of  these  defaults). 

3.  Set  delay.  If  the  :  delay  keyword  argument  has  not  been  supplied  by  the  us»,  then  set  the 
delay  to  the  granularity  of  the  call  to  DO-FILE-parallel. 

4.  Reset.  Set  the  *current- job-unique-nianber*  back  to  0  and  clear  the  relevant 
subdirectories  (these  are  jobs/  and  output/  by  default,  together  with  temp/). 

5.  Create  the  jobs  files.  These  are  the  input  files  to  be  shipped  to  the  remote  hosts  inbetween 
the  front  end  and  the  back  end.  Note  that  jobs  with  events  which  follow  a  BOOT-STRAP  or 
NOTE-LIB  in  the  main  file  are  suffixed  with  a  natural  number,  i.e.  they  look  like 
<identi£ier> .  n  where  n  is  the  position  of  the  applicable  BOOT-STRAP  or  NOTE-LIB 
in  the  main  Hie. 

6.  Check  directory.  Be  sure  that  the  current  working  directory  is  what  we  started  with;  if  not, 
change  to  it 

7.  Remove  the  locking.  Otherwise  we  won’t  be  able  to  run  the  dispatcher! 

8.  Run  dispatcher.  Return  what  it  returns  and  print  a  happy  message  if  it  returns  T.  Clean  up 
by  returning  to  the  working  directory  that  we  started  with  in  case  that  differs  from  the 
current  working  directory. 

The  dispatcher  is  called  with  :  back-end  set  to  the  filename  argument  (for  the  events  list)  of  the  call  of 
DO-FILE-PARALLEL.  We’ll  omit  discussion  of  the  defaults,  as  these  are  documented  earlier  (see 
Section  2.2,  DO-FILE-PARALLEL-OPTIONS),  except  to  discuss  briefly  the  function 
NQTHM-COMPLETED.  Recall  from  the  previous  subsection  that  the  :  conpletion-function 
argument  to  the  dispatcher  takes  a  filename  argument  (which  is  supposed  to  be  the  name  of  a  status  file) 
and  returns  eitho-  NIL  or  a  pair.  The  function  NQTHM-COMPLETED  in  fact  looks  for  a  line  that  equals 
the  *  output -corqjleted- St  ring*,  "Boyer-Moore  job  terminated”,  in  the  given  file,  and  then  reads 
the  next  line.  If  the  first  7  characters  of  that  next  line  are  "success"  (when  converted  to  lower  case),  then  it 
returns  the  pair  (' success  .  NIL).  Otherwise  it  returns  the  list  (' failure  .  (<line>)), 

where  <line>  is  that  line. 

Note  that  the  appropriate  messages  to  the  status  file  are  placed  there  by  the  remote  job.  The  top-level  loop 
function  PAR-NQTKM-TOP-I^EVEL  in  the  system’s  front-end  file  in  fact  uses  a  system  call  to  echo2  to 
print  the  string  "  FAILURE:  The  event  <event-name>  failed.”  to  the  error  stream  (and  hence  to  the  status 
file)  in  this  case,  where  <event-name>  is  the  name  of  the  failed  event;  otherwise  it  prints  "Success!!" 

33  The  system  front-end  file. 

Recall  that  the  default  "front-end"  file  for  do-FILE-parallel  is  the  file 
/local/src^arallel/front-end.lsp.  The  file  /local/src/^arallel/front-end-with-doc.lsp  is  a  version  of  that  file 
with  comments,  so  complete  documentation  may  be  found  in  that  code.  In  this  subsection  we  give 
describe  that  code  (which  may  be  found  in  Appendix  C). 
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Recall  that  the  hront-end  file  is  the  Hrst  file  sent  into  the  standard  input  stream  of  the  nqthm  or 
pc-nqthm  process.  That  is,  a  remote  host  will  be  reading  in  and  executing  the  forms  from  this  fde. 
After  all  the  forms  in  this  file  are  read,  the  particular  job  file  will  be  read  in.  The  last  form  in  the  job  file 
is  (PAR-KQTHM-TOP-LEVEL) ,  where  the  function  PAR-NQTHM-TOP -LEVEL  is  defined  in  the  front- 
end  file.  It  is  a  top-level  loop  which  will  process  the  forms  in  the  back-end  file,  i.e.  the  file  of  events.  The 
central  thing  to  understand  from  the  front-end  file  is  the  defmition  of  par-nqthm-top -level.  That 
function  executes  a  loop  after  which  it  "cleans  up"^.  Here  is  what  its  main  loop  does. 

1.  Read  the  next  form. 

2.  If  there  are  no  more  events  to  process,  return  T.  There  are  no  more  events  to  process  if 
we  are  eith^  (a)  at  end-of-file  (x  (b)  at  the  event  *finish-naine*  (set  in  the  job-file  to  be 
the  flrst  event  t^t  we  should  not  process). 

3.  Print  the  next  event,  evaluate  it,  and  print  its  value.  However,  we  turn  PROVE- 
LEMMAs  into  ADD-AXIOMs  until  we  find  the  starting  event.  The  variable 
*start-name*  is  initialized  to  the  starting  event’s  name  in  the  job  file.  In  case  there  is  a 
preceding  BCXDT-STRAP  or  NOTE-LIB  the  variable  *start -position*  is  also  set  in 
that  file,  and  all  events  before  the  appropriate  BOOT-STRAP  or  NOTE-LIB  are  ignored. 

4.  If  the  value  is  MIL,  exit  the  loop  with  value  MIL. 

The  first  part  of  the  "cleanup"  phase  has  already  been  described  in  the  subsection  above:  A  success 
message  is  printed  to  the  error  stream  if  all  events  evaluated  to  non-MiL,  and  otherwise  a  failure  message 
is  printed.  (We  also  handle  the  case  that  a  read  fails,  i.e.  and  end-of-Ble  is  encountered  during  the 
process  of  reading  the  next  form.)  Finally,  we  exit  in  the  C  language  tradition,  i.e.  with  status  0  if  all 
events  evaluated  to  non-MiL  and  1  otherwise.  Our  current  implementation  does  not  use  that  status 
information,  however. 

The  only  slightly  tricky  part  of  this  strategy  is  that  the  "cleanup  forms”  are  not  evaluated  in  Lisp  when  an 
error  is  caused  until  control  is  returned  to  the  built-in  top-level  loop.  Fortunately,  in  KCL  there  is  a  global 
variable  *brealt-enable*  which  one  may  initialize  to  MIL  in  order  to  avoid  entering  the  break  loop 
when  an  error  occurs.  This  is  the  first  thing  we  do  in  the  front-end  file. 

The  front-end  file  also  contains  the  form  (setq  sys : :  *notify-gbc*  t ) ,  which  turns  on  garbage- 
collection  notification.  This  feature  should  make  it  virtually  impossible  for  an  NQTHM  job  to  "bomb" 
simply  because  it’s  not  putting  out  characters  fast  enough;  if  all  other  output  is  slow,  still  there  are  likely 
to  be  frequent  garbage  collection  messages!^ 


^in  Li(p  jargon,  it  executes  the  cleanup-forms  of  an  UNHIND-PROIECT 

^One  exception  is  compilation,  where  certain  phases  of  the  code  generation  can  uke  a  long  time  without  a  garbage  collect  In  this 
cate  one  may  wish  to  specify  a  Urge  number  of  seconds  for  the  cXlll-lf-no-progreae  parameter,  at  illustrated  in  the 
example  in  Appendix  B,  where  we  use  1200. 
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4.  Results  and  Conclusions 


4.1  Trial  Runs 

We’ve  nin  several  tests  to  try  to  break  the  error  handling  capability  of  the  system.  These  included  killing 
processes  in  the  middle  of  a  job,  adding  sleep  commands  to  jobs  to  make  them  "killable",  and  even 
turning  off  a  remote  processor  before  it  finishes.  In  all  these  tests  the  problem  was  detected  and  the 
parallel  job  recovered. 

The  dispatcher’s  utility  has  been  demonstrated  separately  from  the  problem  of  doing  NQTHM  runs  in 
parallel.  It  was  used  to  compile  code  in  parallel.  That  experiment  is  described  in  Appendix  B. 

We’ve  run  several  different  nqthm  files  for  testing.  The  largest  nqthm  job  we’ve  run  in  parallel  so  far  has 
been  the  events  that  create  the  various  shared  libraries  created  by  Bill  Bevier.  With  7  Sun  3/60  processors 
(including  the  process^  that  ran  the  dispatcher)  the  job  took  2  hr  22  min,  compared  with  10  hr  3 1  min  for 
one  dedicated  processcx'  running  pc-nqihm  from  the  shell. 

The  speedup  of  about  4  1/2  is  about  63%  of  the  theoretical  maximum.  The  following  lines  from  the 
parallel  run’s  output  show  that  only  a  fairly  small  portion  of  of  the  37%  loss  is  due  to  uneven  finish  times 
of  the  jobs. 


coaqplatad»  scarab  ;  I  (job#  63)  2:08 

coiqplata<l»  jlnglaa  ;  PUT-WITB-LARGE-IHOEX  (job#  52)  2:09 

coiig>latad»  algln  :  TIMBS-DISTRIBUTES-OVER-DIFFEREKCE  (job#  64)  2:09 

cong>lated»  dacaf  :  FOIS-WITR-LARGE-IHDEX  (job#  55)  2:10 

coiig>lata<l»  oscar  :  QnOTZEMT-DIFFEB2NCE-l£SSP-ARG2  (job#  57)  2:17 

coiif>latad»  cllantl2  :  PUIS-POIS3  (job#  54)  2:21 

coiq>latad»  cliantl3  :  QOOTIEHT-DIFFERENCEl  (job#  58)  2:21 

All  avanta  hava  baan  run  auccassfully . 
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4.2  Future  Work 

There  are  several  things  we’d  like  to  do  (someday)  that  would  increase  the  utility  of  this  coarse  approach 
to  parallelism  in  NQTHM. 

•  Integrate  with  J  Moore’s  library  utilities.  When  NQTHM  with  efficient  library  utilities  is 
released,  it  will  have  two  possible  impacts  on  our  system.  First,  it  may  allow  remote 
systems  to  avoid  redoing  the  DEFNS  and  old  PROVE-LEMMAS  for  each  job.  Second,  and 
more  importantly,  it  will  be  very  desirable  for  our  system  to  produce  endorsed  "books". 

•  Include  the  notion  of  dependencies.  We  should  investigate  better  ways  to  create  the  job 
files.  Some  events  depend  on  others,  like  PROVE-LEMMAs  after  a  BOOT-STRAP,  and 

some  events  take  longer  and  should  have  processing  resources  devoted  to  them  early. 

« 

•  Find  the  bottlenecks.  We’re  not  sure  right  now  what  is  keeping  us  from  getting  better 
performance.  (63%  may  be  as  good  as  it  gets,  but  we  should  at  least  know  what  the 
important  factors  are.) 

•  Try  some  big  runs.  The  system  has  limits.  (100  remote  hosts  would  surely  fill  the 
dispatcher’s  process  table,  for  example)  We  should  find  out  what  these  are. 

•  Try  to  run  remotely  The  system  has  been  designed  to  run  on  machines  that  are  not  in  the 
local  area  network.  We  should  try  it. 
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43  Conclusions 


It’s  not  difficult  to  take  advantage  of  idle  processors. 

GNU  EMACS  with  KCL  running  under  Unix  is  a  wonderful  development  environment. 

The  basic  idea  of  running  NQTHM  event  files  in  parallel  by  sending  remote  processors  subsequences  cf 
events  seems  to  work  fairly  well.  With  large  runs  we  have  obtained  close  to  2/3  of  the  theoretical 
speedup. 
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Appendix  A 
Instrumentation 


In  order  to  get  a  handle  on  parallel  job  usage,  some  instrumentation  has  been  added  to  the  code.  There  are 
several  files  that  are  updated  when  parallel  work  is  done. 

A.l  /iocal/src/parallel/statistics/blocks 

This  file  contains  infwmation  about  the  creation  and  removal  of  block  files.  There  are  4  types  of 
messages. 

•  USER-BLOCK  A  user  has  blocked  a  processor. 

•  USER-UNBLOCK  A  user  has  unblocked  a  processor. 

•  USER-UNBLOCK-FAILURE  A  user  tried  to  unblock  an  unblocked  processor. 

•  SYSTEM-UNBLOCK  The  system  unblocked  some  processors.  The  list  of  block  files 
follows. 

•  SYSTEM-UNBLOCK-FAILURE  The  system  tried  to  unblock  processors  but  there  were 
none  to  unblock. 

example  block  file: 

•f-l-t-02/20/89  17:13:10  USBR-UMBLOCK  kaufmann  cll 
+++02/20/89  19:15:16  USER-BLOCK  wilding  oil 
+++02/21/89  16:28:16  SySTEM-UNBLOCK 
total  1 

-rw-rw-r—  1  wilding  2  Fab  20  19:06  oil 

+++02/21/89  16:28:47  SYSTEM-UKBLOCK-FAILURE 
+++02/21/89  16:28:49  USER-BLOCK  wilding  oil 

A.2  /local/src/parallel/statistics/job-info 

This  file  contains  information  about  the  parallel  jobs  run.  The  information  contains  the  date  and  lime,  the 
user,  the  dispatch^  host,  the  run  code  number  (see  the  next  section),  the  requested  hosts,  and  the  blocked 
hosts. 

example  job-info  file  excerpt: 

+-(■+02/24/89  17:37:17  PAR-START  kaufmann  cll«ntl2  355006 
boats:  ("cli«ntl2"  "scarab"  "algin"  "cll«ntl3"  ”ds-.af”) 
blockad:  ("andarson",  "cll"  " jlnglas”) 

+++02/24/89  17:48:38  PAR-END  kaufmann  cllantl2  355006 
hosts:  ("cllentl2"  "scarab"  "algln"  "cllantl3"  "dacaf") 
blockad:  ("andarson"  "cll"  "jlnglas") 

+++02/26/89  16:15:17  PAR-START  wilding  cllantlE  522876 
hosts;  ("cllantl2"  "oil"  "andarson"  "jlnglas"  "scarab"  "dacaf" 

"cliantl3"  "algln"  "oscar") 
blockad:  NIL 

+++02/26/89  16:16:30  PAR-END  wilding  cllantl2  522876 

hosts:  <"cllantl2"  "cll"  "andarson"  "jlnglas"  "scarab"  "dacaf" 

"cllantl3"  "algln"  "oscar") 
blockad;  NIL 
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A.3  /local/src/parallel/statistics/runs 

This  directory  contains  a  trace  of  all  runs  using  the  parallel  system.  There  is  a  file  in  this  directory  of  the 
form  <usCT-name>.<code-number>  for  each  parallel  run.  The  code-number  is  the  number  found  in  the 
job-info  file.^  Each  file  contains  the  header  information  of  the  job-info  file  plus  progress  messages  that 
the  user  received  while  he  ran  the  job. 

Example  runs  directrxy; 

ell:wildlng[4] %  od  ~XaiuCiBann/diapatab*x 
cll:«ildlng[5]%  od  /looal/arc/parallal/atatlBtlcs/runa 
cli;wllding[6]«  la 

kaufmaim. 345071  kaufmann. 346447  kaufmann. 3SS006  wilding. 524027 
kaufaiann. 345195  kaufmann. 348576  wilding. 522876  wilding. 524251 
kaufmann. 345253  kaufmann. 352817  wilding. 523562  wilding. 524815 
cll:wildlngC7]«  cat  wilding. 522876 

■l~H-02/26/89  16:15:17  PAR-START  wilding  cllentl2  522876 
hoata:  fcliantia"  "oil"  "andaraon"  “jinglea"  "acarab”  "decaf" 


"cliantl3''  "elgin"  "oacar") 
blocked:  NIL 

atartlng»  cllentl2  :  DELBTB  (job#  1)  0:00 
atartlng»  cli  :  DELBTB-COMMOTATIVITy  (job#  2)  0:00 
ataxting»  andexaon  :  PERMaTAIZONP-PSESERVES-KEMBBR  (job#  3)  0:00 
ataxtlng»  jinglea  :  PERMOTATIONP-TRANSITXVE  (job#  4)  0:00 
aong>letad»  anderaon  :  PERMUTATIONP -PRESERVES-MEMBER  (job#  3)  0:01 
coBg>lated»  cliantl2  ;  DELETE  (job#  1)  0:01 
coiig>lated»  cli  ;  DELETE-COMMUTATIVITY  (job#  2)  0:01 
cong>lated»  jinglea  :  PERMUTATIONP-TRANSITIVE  (job#  4)  0:01 
•("(-f02/26/89  16:16:30  PAR-END  wilding  cllentl2  522876 


hoata:  ("cliantl2"  "cli"  "anderaon"  "jinglea"  "acarab"  "decaf" 
"clientlS"  "elgin"  "oacar") 
blocked:  NIL 
cli:wildlng[8]% 


^Thij  code-number  i»  actually  the  file-server'i  "universal  lime"  in  tecoodi,  modulo  1,000.(XX). 
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Appendix  B 
Parallel  compilation 


In  this  appendix  we  describe  an  experiment  using  the  dispatcher  for  parallel  compilation  of  NQTHM. 
This  method  could  be  generalized  to  solving  the  problem  of  compiling  arbitrary  systems.  However,  we’ve 
chosen  simply  to  build  a  reasonably  optimal  parallel  NQTHM  compiler  at  this  point.  We  describe  the 
parallel  compiler  and  how  fast  it  is. 

B.l  How  Parallel  Compilation  is  Done 

The  NQTHM  code  is  brdcen  into  the  file  sloop.lisp  (which  is  Bill  Schelter’s  loop  macro  definition)  and  10 
other  files.  Three  of  those  10  must  be  compiled  in  sequence  first  The  remaining  7  files  may  then  be 
compiled  in  parallel.  The  function  COMPILE-NQTHM-SEQ  defined  below  is  like  the  existing  function 
COMPILE-NQTHM  except  that  it  doesn’t  compile  sloop.lisp  or  the  final  7  files. 

(DEFUN  COMPZUE-NQTBM-aaq  () 

; ; ;  ****  This  la  all  dona  bafora  saving  a  cora  Imaga 
(FLET  ((LF  (N) 

(LOAD  (EXTEHD-FItE-NAMB  N  FIXX-EXTENSIOM-BIH)  )  ) 

(CF  (H) 

(COMPIia-FIia  (BXTEND-FII£-NXKB  N  FILE-EXTENSION-LISP)))) 
(PROCLAIM-NQTHK-FILES ) 

(ayataiB  "data") 

(format  t  "-CComplatad  proclaiming  ngthm  £llas.-S") 

;;;  (CF  "sloop")  *****  Na  asauma  that  sloop  axlsts  In  any  raasonahla  systam! 

(LF  "/local/src/ngthm/sloop") 

(CF  "basis”) 

(LF  "basis") 

(CF  "ganfact") 

(If  "ganfact") 

(CF  "avants") 

(LF  "avants"))) 

We  don’t  compile  sloop.lisp  because  the  object  file  sloop.o  does  not  change  very  often  (i.e.  we  view  it  as 
being  a  file  provided  by  the  Lisp  system) 

The  idea  is  to  save  a  core  image  after  compiling  and  loading  the  "sequential"  files,  and  then  run  that  core 
image  in  parallel,  compiling  one  of  the  remaining  7  files  in  each  job.  The  top-level  call  of  this  compiler  is 
a  shell  command  that  calls  akcl  twice:  first  for  the  initial  sequential  compilation  of  the  first  3  files,  using 
the  function  COMPILE-NQTHM-SEQ  shown  above,  and  then  for  the  parallel  compilation  of  the 
remaining  7  files. 

The  file  nqthm-par.lisp  is  the  same  as  existing  Boyer-Moore  file  nqthm.Usp,  except  that  it  includes  the 
definition  of  COMPILE-NQTHM-SEQ  and  contains  the  following  definition. 

(DEFUN  COHPILE-ona-NQTBM-fll*  (fllanum) 

;;  ****  This  !■  all  dOM  aftvr  saving  a  cora  imaga  from  (cong>ila-nqthm-saq) 

;;  Hanca  wa  may  aaauma  that  all  tba  proclamationa  ara  alraady  around. 

(COMPILE-FILE  (EXTEND-FILE-NAME  filename  FILE-EXTENSION-LISP))) 


Other  than  nqthm-par.iisp,  the  following  files  comprise  the  crucial  parallel  compiler  code. 


I 
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The  file  conqpll«-nqtbiB-sb*ll-*crlpt: 
coapil«-n<itba-ah*ll-script : 

data 

alccl  <  coiq>ll*-nq;tbm-a«q.  lisp 

data 

rsh  scarab  w 

rsb  slgln  w 
rsb  cllantl3  w 
rsb  daoaf  w 
rsb  cliantl2  «r 

date 

m  /uar/boina/lcaufmaim/coiig>lletaat/output/* 
ml  /usr/bome/kaufmann/coiigillatast/tengi/* 
akcl  <  congiila-nqthm-par.  lisp 

date 


The  file  conplle-nqtba-seq.llsp: 

(irben  (prob*-£lle  "look-out-otbers.par*') 

(error  "Dlspatcber  Mon't  work  —  please  remove  lock-out-otbers.par") ) 

;;  nqtbm-par.llap  is  just  Ilka  nqtbm.llsp,  except  that  instead  of 
;;  COMPZLB-NQTBM  It  has  COMPILE-MQTHM-SSQ  and  COMPZLS-ONE-MQTHM-PILS, 

;;  and  also  liQ&D-NQTHK  is  omitted  and  sloop  is  loaded  but  not  compiled, 
(load  "nqthm-par . lisp") 

(system  "data") 

(format  t  "~%Begin  compiling  sequential  part  of  nqthm-%") 

( cosqpl  1  e -nqtbm- sag) 

(system  "date") 

(format  t  ”~%Save  core  image-%") 

(save  "/usr/taqi/nqthm-coapllatlon-mldpoint") 


The  file  cosg>lle-nqtbm-par .  lisp ; 

(load  "/local/src/parallel/top. Isp") 

(load-dispatcher) 

(system  "data") 

(format  t  "~%Now  starting  dispatcher  run-%") 

(setq  *output-complated-strlng*  "Cosgilla  job  terminated”) 

(dispatcher  : aommand-nasta  ”/usr/tng>/nqthm-co<qpllation-aildpolnt” 

; ;  use  nqtbm  cosgilatlon  function 

:  front-end  "oosgille-nqthm-front -end.  lisp"  :back-and  "" 
:klll-lf-no-prograss  1200 

: job-list  ' ("ooda”l"a"  "code-b-d”  "code-e-m"  "code-n-r"  "code-s-r" 
"ppr"  "lo")) 
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The  file  con^lla-nqthm-front-and.llcp: 

(satq  aya; :*notlfy-gbc*  t) 

(dafvar  ‘output-conqjletad-atring*) 

{aatq  *output-coii^lated-atrlnq*  ''Coiiq>lla  }ob  tarminated”) 

(dafun  £oniiat2  (atrlng  Craat  arga) 

(ayatam 

( concatanata  ' atrlng 

"acho2  ' " 

(apply  #' format  nil  atrlng  arga) 

"'”))) 

(dafun  format-nqthm-atatua  (atrlng  £raat  arga) 

(apply  #'format2  (concatanata  'atrlng  *output-coiiiplatad-atrlng*  ”~£"  atrlng)  arga)) 

(SETQ  *DEFAnLT-NQTHM-PATH*  "/uar/homa/kaufmann/compllataat/") 


A  typical  job,  e^.  joba/coda-l-a : 

(conq>lla-ona-nqthm-f  11a  "coda-l-a" ) 

(cond  (*braa]c-anabla* 

(If  (proba-flla  '‘/uar/boma/lcaufinann/conq>llataat/coda-l-a. o") 
(format-nqthm-atatua  "Succaaal I ") 

(format-nqthm-atatua  "FAILURE  —  did  not  and  In  a  braab,  but  flla  doaa  not 
axlat."))) 

(t  (format-nqthm-atatua  "FAILURE  —  andad  In  a  braak."))) 


B.2  Results  from  Using  the  Parallel  Compiler 

Summary  of  Ilmaa 

Total  aaquantlal  tlma;  2688  aac. 

Total  parallal  tlma:  1191  aac. 

Real  Spaadup:  (/  2688  1191.0))  =  2.26 

Parallal  run  braalidown  (tlmea  In  aeconds)  : 

load  nqthm-par.llap  4 

Proclaim  109 

coaq>lla-load  aaquantlal  part  226 

aava  cora  laiaga  124 

(rah  . .  .  w]  24  {To  see  which  processors  were  biay  -  none  were) 

load  dlapatchar  coda  13 

Run  dlapatchar  691 


Total  1191 

An  abbreviated  shell  transcript 

17:28:48 

>Loadlng  nqthm-par.llap 
Flnlahad  loading  nqthm-par.llap 
17:28:52 
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B«gin  con^lllng  aaquentlal  pairt  of  nq[thm 


17:30:41 

Conqplated  proclaiming  ngtbm  fllea. 

Iioading  /local/arc/nqthm/sloop. o 

atart  addraaa  -T  20e800  Finished  loading  /local/arc/ngthm/aloop.o 
Compiling  basis. lisp. 

start  address  -T  2b0000  Finished  loading  events. o 
14744 

17:34:27 
Sava  core  Image 

17:36:31 
[rah  ...  w] 

17:36:55 

AKCL  (Austin  Kyoto  Common  Lisp)  Version (1 . 57)  Thu  Sap  29  21:27:15  CDT  1988 
Contains  Enhancements  by  W.  Schalter 

>Loading  /local/src/parallel/top.lsp 


17:37:08 

Kow  starting  dispatcher  run 
HIL 

Hosts  requested:  ("clientl2"  "scarab"  "elgin"  "clientl3"  "decaf") . 
Hosts  currently  blocked:  ("andarson"  "ell"  "jingles") . 


startlng»  cllentl2  :  code-l-a  (job#  1)  0:00 

startlng»  scarab  :  code-b-d  (job#  2)  0:00 

starting»  algln  :  coda-e-m  (job#  3)  0:00 

startlng»  cllentl3  :  code-n-r  (job#  4)  0:00 

startlng»  decaf  :  code-s-z  (job#  5)  0:00 

coiipleted»  cllentl2  :  code-l-a  (job#  1)  0:07 

conqpleted»  decaf  :  code-s-z  (job#  5)  0:07 

starting»  clientl2  :  ppr  (job#  6)  0:07 

startlng»  decaf  :  io  (job#  7)  0:07 

coiig>leted»  cllentl2  :  ppr  (job#  6)  0:08 

coQg>leted»  scarab  :  code-b-d  (job#  2)  0:09 

coug:leted»  elgin  :  code-e-m  (job#  3)  0:09 

coag>leted»  decaf  :  lo  (job#  7)  0:10 

coiig>leted»  cllentl3  :  code-n-r  (job#  4)  0:11 

T 


>Bya. 

17:48:39 


One  measure  of  "efficiency"  is  in  terms  of  how  many  total  CPU  seconds  are  used  in  the  parallel  vs.  the 
sequential  run.  That  is,  this  measure  should  be  an  indication  of  the  overhead  in  setting  up  the  dispatcher 
nin.  We  measure  this  kind  of  efficiency  in  (Ai  below,  with  a  slight  variation  in  (B).  In  part  (C)  we 
measure  the  actual  REAL  speedup  comparing  the  parts  of  the  two  runs  that  can  actually  be  made  parallel, 
i.e.  the  compilation  of  the  files  code-l-a,  code-b-d,  code-e-m,  code-n-r,  code-s-z,  ppr,  io. 

(A)  From  the  point  of  view  of  total  CPU  seconds  used  (on  all  processors). 

First  we  calculate  the  expected  total  CPU  seconds  for  a  theoretical  "optimal"  parallel  run.  More  precisely, 
this  is  the  total  seconds  for  the  sequential  compilation  run  together  with  the  additional  operations  done  in 
the  parallel  run  that  don’t  correspond  to  actions  taken  in  the  sequential  run.  (Notice  that  our  notion  of 
"additional  operations"  does  not  include  time  required  to  fire  up  processes  or  other  overhead  incurred  in 
running  the  dispatcher.) 
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[total  saquentlal  run  tlma]  -t-  [sava  corn  Imago]  +  [rah  ...  w]  +  [load  dlapatcher  code]  » 

2688  +  124  -f  24  -t-  13 

2849 


On  the  other  hand,  if  we  note  early  completions, 


aoii9leted»  clientl2 

:  PP^  (job#  6) 

0:08 

coapleted»  scarab  : 

code-b-d  (job#  2) 

0:09 

coapletad»  elgin  ; 

code-e-m  (job#  3) 

0:09 

coii^letsd»  decaf  : 

io  (job#  7) 

0:10 

coiiplated»  cllentl3 

:  eodo-n-r  (job#  4) 

0:11 

we  get  the  estimated  total  CPU  seconds  for  the  entire  actual  parallel  run  by  subtracting  off  the  minutes 
that  all  but  clientl3  sat  idle  (note  the  gross  rounding  here!): 

[{total  parallel  mn  time)  -  (dlapatcher_tlffle)]  -f  (*  5  {dlspatcber_tlma} ) 

-  (*  60  (3+2+2+1))  « 

<-  1191  691)  +  (*  5  691)  -  480  - 
3475 


Estimated  actual  efficiency  from  the  point  of  view  of  total  CPU  seconds  used: 

(/  2849.0  3475)  -  82%. 

(B)  From  the  point  of  view  of  total  CPU  seconds  used  (on  all  processors),  but  only  for  the  part  of 
compilation  potentially  done  in  parallel 

Sequential  (Real)  Time  for  code- 1 -a  to  the  end  (note  nqthm-par.lisp  and  nqthm.lisp  are  similar  in  load 
times):  Roughly, 

[sequential  total]  -  {[load  nqthm-par.lisp]  -f  [Proclaim] 

■f  [Cootplle-load  sequential  part] }  > 

(-  2688  4  109  226))  » 

2349 

Parallel  run  (as  explained  above,  roughly  8  minutes  less  than  (*  5  691)  ) : 

(-  (*  S  691)  (*  60  8))  = 

2975 

Efficiency:  (/  2349  2975.0)  «= 

79% 

(C)  From  the  point  of  view  of  looking  at  REAL  TIME,  but  only  for  the  compilation  potentially  done 
in  parallel: 

Sequential  run,  coda-l-a  to  the  end  (as  shown  In  (B)  above) : 

2349 

Parallel  Real  Time  for  running  dispatcher: 

691 

RZAli  speedup  for  code-l-a  to  the  end: 

(/  2349  691.0)  - 
3.40 

Efficiency  for  code-l-a  to  the  end: 

(/  3.40  5)  -  68% 
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Appendix  C 
Code 


We  give  here  the  contents  of  the  following  files  in  directory  /local/src/parallel/:  bm.lsp,  dispatch.lsp, 
tc^.lsp,  front-end-with-doc.lsp,  and  PAR. 


te.lcp 


OQM  TO  CRSAZI  BOnR-MOOM  X»OT  rXUU 


aa  -  of  vhat  oa  ii^ot  otroaa  aitfht  look  liko  (ohltoapooo  oddod  boro) . 

il 

fkook'ODd.lop 
(dofba  do-OToat  ....  ) 

(dofoB  poz>aqUM*>top-lovol  . . .  ) 

;;  [oad  roqoloiko  DBTVIUU  oad  ooxlllorloo] 

;;  jobs  fUo 
: (dofOB  ao-io  ()  all) 

; (oot^  io-te  #*ao-lo) 

:  (oot^  blarb>flg  all) 

(dofvor  ^oatpgt*»ocaplofeod-otriag*  ^Soyor-Mooro  job  torainotod**) 

(ootq  *«tdrt-aoBO*  'DBIRTB) 

(oot^  •flalab-aoM*  'MUSSR-DSXBn) 

(ootq  *«tort-po«itioa*  d) 

(por-a«tlm»top-10¥ol) 


;;  book-oad.lop 
(octttol  OToato] 

It 


VMXABLC  OCCZAlUtrroVd 


(dofror  ^output-ooaplotod'Otrla^r*  '*Boyor>Jloozo  job  tozmiaotod**) 


FIAOd 


(dofroz  *ao-lo-la-pozollol-flog*  ail) 

<dofror  •doloto-jobo-flf*  ail)  ;if  aoa-Vlb,  doloto  tbo  filoo  in  JOei/. 


m  wuT 


<dofaa  ovooto»jobo-filo«  (iafilo  •ao'-oito  jobo-dizootory-aoao) 

;;  Boro  xxrxu  i«  tho  li«t  of  aqtim  ovoako,  fOC^dZU  io  tho  ^zoaolozity.  i.o.  tbo 
:  BOibor  of  ovoata  to  bo  raa  oodh  tiao  (oftor  bria^lag  tho  obreaology  op  to  apood) , 
;  oad  dOM'OZXlCTOaY'KMB  oad  OOTPPT-PZIVCTORY-XMa  ozo  tbo  aabdizootozioa  of  tbo 
;  oarzoat  dizoetozy  for  tbo  ii^at  oad  output  filoa  (zoapootizoly) . 

(vzTa-dPBir-Yxxa 
{IMTMAM  zvrzui 

:DXkSCTXO«  :XKYUT  « 


;Zr*POU-BOT-BXXaT  :EMiOR) 

(lot  ((atozt-iafo  (atozt-iafe  iaatrooa  aqa^aiio))) 

;;  ***  Wo  hoToa't  thought  oaough  obout  bow  to  boadlo  too  joba  vitb  tbo  aoao  a 
;  Yozbopa  wo  abould  put  awiatMng  bozo  nhoaking  for  ds^looto  f ilo  aoaoa . 

;;  But  pzobobly  tbot'a  bottoz  loft  for  wtMo  wo  dool  with  booko  oad  aubb. 
(itozoto  for  toil  oa  atort-lafo 

wboB  ;o  bit  of  orzoz-oboeklag 

(if  (otoo  (coor  toll)}  t 

(ozzoz  "Bnoooatorod  noa-otoa  la  atozt-aaaoa,  -'«.**  (oooz  toil}}) 
ooUoctt 

(ozooto-oao- job^filo  (oor  toil)  (oodz  toil)  joba~dizootory-niao) ) ) ) ) 


(dofoa  atort-atao-of  (foza) 

(if  (aow-diapotob^ooll  foza)  (ooz  foza)  (eodz  foza))) 

(dofn  loaao-foza  (foza) 

;;  Thia  ia  o  rooogaisoz  for  tbo  oloaa  of  fozaa  whiob  dotozaiao  our  aotioa  of  gzoaulozlty. 
;;  Tor  oaoi^plo,  if  OOWdTluair  boooaoa  oa  owoat  tboa  wo  abould  iaoludo  tbot  too. 

(aiboT  (ooz  foza)  '  (pzowo-loaoo  loaau)  ;toat  t'og)} 
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aMr>41cptttA-eall 

;  ***  Ihoold  h%  aodLfi^d  «Imb  «•  4m1  with  books  oad  ooiah. 

<or  <of  (ooc  fom)  'boot-otx«p) 

(Of  (oor  fosm)  'aotO'Uh})} 

(dofte  otort-'laCo  (Inotrooa  oao-oito) 

;  kotoxno  o  liot  of  poizo  (<owoat-atwo>  .  a) ,  whoro  a  >•  0  iadlootoo  tho  stortiag 
;  pooitloa  (fzoa  a  boot-otzap  or  aeto-lib  eall  or  tho  boginning  of  filo)  for  this 
;;  oooBt->asao.  (Z  ooold  rotaza  tho  positico  os  woU,  bat  t  woat  tho  owoat-aaaos 
;  to  orooto  aioo  filo  amass ,  mad  thoa  tho  positioa  oofht  to  bo  irrolowmat . ) 

Votioo  that  OTory  soo  will  oad  with  a  Ismo,  oxo^d  whoa  tho  aoouaalatioa  pzeooss 
;;  is  abortod  by  a  boot-strap  or  aoto-Ub. 

(itozato  for  positioa  fzca  1 

with  foxa  aad  i  «  0  aad  suffix  ■  0  sad  roady-flg  •  t  aad  ans 
::  i  is  tho  of  IsMias  wo' wo  got  so  far  la  tho  suo  wo' so  aocnaalatiag 

whoa  roa4y-flg  is  sot  wo  feaow  wo'zo  roady  to  start  a  aow  suo 
wbilo  (not  (oq  (sotg  foxa  (road  iastroaa  all  a-sozy-razo-ooas) ) 
a-wozy-raro-ooas) ) 

do  (pcoga 

(ooad  ((aot  (aad  (ooasp  fozm) 

(os  (anil  (odr  foza)) 

(ooasp  (odr  foza) ) )  ) ) 

(orzor  ”  laaouatozod  atoa  la  owoat  filo:«'S  fozm)) 

( (aow-dispatoh-oall  fozm) 

(sotq  roady-flg  ail)  ;iB  oaso  wo  woro  alroady  roady  aayhow 
(sotg  i  0) 

(sotg  suffix  positioa) 

(sotq  ana  (ooas  (oaas  (start-aaaa-of  fozm)  suffix)  ana))) 

(roady-flg 
(sotq  zoady-fXg  ail) 

(sotq  ana  (ooas  (ooas  (start-aaao-of  fozm)  suffix)  aas) ) ) 

(t  all)) 

(whoa  (loBSka-fozm  foxm) 

(sotq  1  (1+  i)) 

(whoa  (•  i  suo-siso) 

(sotq  i  0) 

(sotq  roady-flg  t)))) 
fiaally  (rotaza  (arowozso  aas)))) 

(dofua  oroato-ooo-job-filo 

(start-iafo  fiaish-info  jobs-dirootozy-namo 

saux  (job-filo-aaao-fzom-paiz  (job-fllo-aaao-fkom-pair  start-iafo))) 

; ;  Vritos  out  aa  laput  filo  la  tho  subdizootozy  JOM-DXliBCtoaY-mai 
;  wbioh  is  appzopriato  for  nqUia  os  po-aqtha.  Zf  STMtT-ZMVO  is 
;  (<owoat-aamo>  .  <positiea>) ,  that  IxqKit  filo  is  iotondod  foz 
;  ruaniog  owoats  in  tho  filo  from  tho  positioa  <positioa>  ia  ZlcrZLi 
;  vp  to  (but  aot  iaoludlng)  rzvzsfl-lCMS,  with  output  to  bo  writtoa 
::  to  a  filo  ia  OQTPOT-OZRSCTOaY-lUMB.  Lawais  bofoso  tho  oroat 
;  aaaod  <owoBt-Baiao>  (aftos  <positioo>) ,  howowoz,  azo  to  bo  taXoa 
as  axioms. 

Yho  fuaetioo  aotually  zoturas  tho  aow  input  filo's  aaao. 

(with-opoa-filo 

(outstrosa  (ooaootonsto  'stziag  jebs-diroetozy-asmo  job-filo-aaao-fzem-poiz) 

{dlzootiom  :  output) 

(whoa  *ao-lo-iB-porollol-flag* 

(foaast  outstrosa  "(dofua  ao-io  ()  alD-t") 

(fesaat  outstzoom  "(sotq  io-fa  f'Bo-io)-'%") 

(fozaat  outstrosa  "(sotq  hlurb-flg  ail) •>%"}) 

(format  outstroam  "(dofwar  *output-Qomplotod-striag*  «*•)•*%" 

^output-oomplotod-stria^) 

(foamt  outstroam  "(sotq  astazt-naao*  '-d)-'%*  (oar  start-iafo)) 

(format  outstroam  "(sotq  *fiaish-asM*  (oar  f iaish-lafo) > 

(format  outstroam  "(sotq  *stazt-pesitioa*  -S)~%"  (odr  stsrt-iafo)) 

(formst  eutstzosm  "(par-oqthm-t«p-lowol)<-%”  (odr  stsrt-iafo))} 
job-fils-aams-fsom-pair) 

(dofua  Job-filo-Bomo-fscm-pair  (start-iafo) 

;;  Bozo,  as  ususl,  BTABY-ZXro  is  a  pair  (symbol  .  aumboz),  whoro  symbol  is  aa  owoat-aama 
(if  (-  (odr  stazt-iafo)  0) 

(string  (oaz  stazt-iafo)) 

(ooaoatonato  'string  (stziag  (oar  stazt-iafo)) 

(prial-to-stzing  (odr  start-iafo))))) 
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r 


;;  MtarM  Mn>aiX  if  «•  find  thm  haadac  *ootpgt-ooapl«t*d-«txlaQ*  (ahieb  in't  than 
;;  ijMdiatalX  faXlonad  by  •od-of-fila) 


((not  (p»aba»fila  flla-aana)) 

(oocui  'fdiloca 

<li«t  **umtDa.  fAZLOFS:  Ilia  -A  ahouXd  axlat,  bat  doaa  aott  <*&•* 
fila-naaa) ) ) 

(t 

(«itb'opaa-fila  (atraaa  fila-naaa) 

(Itarata  with  iapat-atrlAg 
do 

(if  (aatq  iI^aKt'•otrin^9  (caad^liaa  otzaam  nil  nil) ) 

(Oban  (aqaal  l^ot-otriag  *oatput-OMiplatad-atring«) 

(lot  ( (oQooaaa-otring 
(raad-lina 
atxaan  nil 

•^tnUMOiL  fAZLCAB:  lad  of  atatna  filo  aaoountarad.  **)  )  ) 

(oond 

((and  (>  (langtb  anooaaa-atrinq)  C) 

(aqual  (otrinff-downoaaa  (aubaaq  anooaaa-atzlng  0  7>) 
**aaooaaa**) ) 

(ratorn  (oona  *  aoooaaa  nil) ) ) 

(t  (ratozB  (liat  *failora  anooaaa-atxing) ) ) ) ) ) 

(ratuxn  nil) )))))) 

(dafoa  atatna-fila-naM-fson>atrinq  (diraotory-naaa  job-naaa) 

(ooaoatanata  *atxing  dicaotocy-naaa  job-naaa  ** .  atattaa**)) 

(dafon  ooxzaat-dixaotory  () 

(dizaotaxy-nanaatzlnq  (txoaaaaa  *"*))) 

(dafon  raaat^diraotory  (targat-digaotagy-n  tmm ) 

(«ban  (not  (aqnal  (ooggaat-digaotoay)  taggat«digaatoay~nana) ) 

(ayajdihdig  taggat-digaotory-nana) ) ) 

(dafcB  do~fila-pagallal  (infila  aoo-aiaa  4kay 

(jobo'-digaetogy-nana  **^^0/**) 

(ontpot-digactory-nana  "outiwt/**) 

(boata-fila-nama 
(if  (pgaba-fila  **beata**) 

**boata'* 

(ooaoatanata  'atglag  •ayatan-pagallal’-digaotogy* 

«'boata.aU**))) 

(lacal*boat*figat  t) 

(biU-if-no-pgograaa  200) 

(or— anff-naaMi  “pa-BqtlHi**)  ;nigbt  ba  **nqtbii'*,  ato, 

(froat-aod  (ooncataaata  *atging  *ayataai-parallal*diraatory* 
"fgMt-aad.lap**)  ) 

(dalay  ail) 

(aioa-flag  t) 

cans  (ta^-digaetory>aaaa  "ta^/") 

jobo'^filaa  (ourxaat-diraotogy  (ouggant'diraotosy) ) ) 

;  Anna  a  Boyag-Uoora  araat  liat  in  parallal  by  craatiag  filaa 
;  tritb  cnATS-JOBf-rzUd  (aaoh  of  vbleb  ealla  D^rzU>«ZTS'IUCC3MB) 

;  and  than  tba  diapatdbag .  Tba  figat  foog  foxna  ia  tha  FIOOll 

:  balov  galata  only  to  olaaring  oat  tba  i^ot  and  ootput  aobdigootogiaa . 

latoma  T  if  aad  mly  if  all  joba  eanaa  T  to  ba  wxittan  to  tba  " .  atatna" 

;;  filaa.  (laoall  alao  that  PO-riLl-HZTl-IOCCUl  oanaaa  f  to  bo  tfaaa  vrittan 
;;  if  and  only  if  all  of  ita  oronta  oea^ota  aaooaaafally.)  Zf  a^  job  faila 
;;  la  thia  aanaa,  tba  failad  joba  ago  gapertod  to  tba  atandagd  output. 

Votioa  that  va  do  tba  looking  on  pag  joba  bago  booaaaa  «o  ifogk  vith  tba 
;;  joba  aubdigootogy  avoa  bafoga  atagting  up  tba  diapatobag. 

(onwind-protaot 

(proffk 

;;  Oat  oMitgol  oTog  tba  ooggaat  digootory  gigfat  away. 

(^laok  -loek-on>pag- joba) 

( look-on-pag-  joba ) 

;  lot  19  tba  joba . 

(anwind'pretaot 
(progn 

(obk-fila-og-digootory-aziata 
(obk-fila*og>digaetogy>axiata 
(ohk-fila-og-digaatory*-axiata 
(ohk-fila-og-digoetogy'-ojciata 
(^dt-fila-or>digaotogy'agiata 


fgont-and) 
boata-filO'Oana) 
joba-digaotogy^nana  t) 
ontput'digoctory-nana  t) 
tanp  d<raotogy~D«na  t) 
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(vhMH  (nail  dalay)  (Mtq  dalay  rao-slsa)) 
<oXMr-dlr«at«xy  job*-diyotc«y»aw 
(••tq  *oarr«ot-job-anlqo*-Bi»b«x*  0) 

;  CsMta  jolM  fll«« . 

<<ozmat  t  ***><Cg— ting  joba  .  .**) 

(••tq  jobfl-iril*«  (oTMta-^jobc-fllaa 

IaTU*  «ae-«lK*  lob«>di««0t«zy-aaM}) 

(foxaa^  t  ** 

;  Xaa  job#  la  porollol . 

(r#««t-4ir#oto<y  onrxoat-diraotory) > 

(roBOTo-look-oa-por- job#) ) 

;  Coll  tbo  dlapobdboE . 

(foout  t  *'*-#XiiTokinq  dlspotAox.*-#"} 

(«b#B  (di«p#t<^or  :  job#->dirootozy-niao  job«*diroetoxy^B#»o 

:  oatpot-dirootocy-ouM  ootput-dl  rocttory-iuao 
:bo«^-fil#*'&#a#  ho#t#»flIO"aMio 
tlooaJl-host**  first  loo#l-bo#t-flx#t 
:klll-if-BO-progro##  kiU^if-ao-proqros# 

;  oaaB#xkd~a#ao  OQoa#ad~&#ao 
:dol#y  dolsy 

:  ooa^otioa-faiietloa  f'aqtte-ooffpXotod 
: froat-oad  froat-oad 
tboek'-oad  lafilo 
taioo-^flsg  aloo-flsg) 

j  (foxast  t  "'•tbU  OToats  b#*vo  booa  nm  soooossfoUy.-t**) 

t) 

> 

::  Closa  op. 

<«hoa  *dolotO'>job«-fIg* 

(Itocsto  for  X  la  job«ofllos 
do  (doloto-m#  r) ) ) 

(rosot'dixootosy  oorroatodirootory) ) ) 


;;  di«p«tA.l^ 


:  Thl«  a«ctaM«  that  &qtte  !■  le«d*4 .  Otb*rwl««  pm  plkould 

(lo«d  "/qpr/lpopJl/pro/nqthM/pXopp**)  pad 
(dPflMorp  itarpt*  (tract  argc)  Mrloop:  tclpop  «fprgp)). 

:  Thara  pra  probably  p  oot^a  of  otbar  foaotiooc  balow  that  would 
aaad  to  ba  daflaad  la  aJcel,  a.g.  oo-dti^lleptacp. 

;  Va  pppoBa  that  only  ooa  paxallal  job  la  balag  raa  la  tba  oorraat 
:  diraetoxy  at  aay  oaa  tlaa . 


•nOCTORU 


;  Bara  la  tha  oaly  atruotoxa  wa  iatxodaoa : 

(dafatxaot  job 
1 - aodbax) 

; ; ; ; ; ; ; ; ; ; ; ; ; : ; ;  IXW#  ; ; ; ; ; ; ; ; : ; ; ; ; ;  .♦ ;  ; 

(dafwpx  *loopl-hoat-wpXBlag*  t) 

(dafvpc  *apwa~t  Ip  filaa-flg*  all> 

(dafvpx  *klll-^dlapPt<dMX-tipoa-aaaiag-fpilora*  ail) 

(dafwar  ^olaax-goaxy-flg*  ail)  ;tiaad  fox  CXBAB'‘DXMCfQliy 

v»TJ>»TJl  OBCXJUUtxoiid 

(dofwar  *fpilad-job-apaaa*) 

(dafwpx  *ayat«a>ppxpllal-dicaotoxy*  ** /local/ axc/paxallal/**) 

(dpfvax  *ppxfcll3.-gr— lad*  **/looal/bla/pptkill**) 

(dafxpx  *pxotaqtad-hoata»aqbdixaqtoxy*  *'pxotaotad~boata/**} 

;  Tba  foUowiag  pd.ght  bawo  baaa  ia  /tmp,  btxt  what  if  otbar  nm  dlapatehax 
;  joba  at  tba  aaiM  tiaa?  fo  wa  pot  tha  "joaJc**  ia  tba  corraat  diraetoxy, 

;  wbaxa  tba  lookiag  aachaniaw  aboold  apwo  oa. 

(dafvpx  *jUBk«fila*  "diapatobax-joak.ltp**) 

(dafwax  *loob-fila-waa*  **loek-oot>otbaxa.ppx**) 

;;  fox  tba  xoa  fila,  a.g.  "/looal/axa/paxallal/atptiatioa/xuaa/wildiag.l" 
(dafwmx  •loag-job-info-fllamaa*) 


;;  a.g.  tba  a«hor  2  ia  tba  patbaaaa  aboro 
(dafxax  *oaig*-Talaa*) 

(dafrax  *aaar-aaaa*)  ;a.g.  "wlldiag" 

(dafwar  *aboxt- job-iafo-ftlOBMia*  ;i.a.  "/looal/axo/paxallal/atatiatioa/job-iafe” 

(ooaaataaata  'atxiag  •ayataa-paxallal'-dixaetoxy* 

"atatiatioa/job-iafo") ) 

(dafrax  *oqtpot -iaf o - aobdixaotoxy *  "atatiatiea/xona/**) 

;aat  ia  dlapatehax  wtiaa  it' a  eallad 

.’oaad  to  aaka  tba  boata-jobwaliat  aad  to 
;pxetaet  agaiaat  bloekiag  apawaa  oa  looal  boat 

;BOt  aotoally  aaad  baxa,  but  ooold  ba  aat  ia  an  Init-fila 

;DOt  aotoaatioally  xaaat  on  aaw  dippatobax  call 

;  aaeoada  file  aaxrax'a  olook  ia  abaad 
(ooold  ba  nagatlTO  if  babiad) 


(dafrax  *ataxt-tiaa*) 

(dafwax  *looal-beat-Baaa*) 

(dafxax  *all~Talid~boat-n ■■aa *  ail) 
(dafvax  *oiirxant>job-oaigaa~ntniibax*  0) 
(dafwar  *fila-aarTar-tijia-diaparity*) 
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vzxa  nriLXTZu 


(d*ftm  ioptlouJ.  dir«otozy>flg) 

(wbaa  (not  (probo-CUo  fUnano) ) 

(IK  dixoetory-fl9 

(•rror  aovo  —  dixootozy  doonn't  oxisti**  Kilnoann) 

(•nor  *'Bod  novo  fll«  -K  doooa't  oxlotl**  flX*D«M))) 

(iibflB  (oad  4ir«otory-flg 

(not  (•gaol  (orof  filon—o  (1-  (longth  fHonano)))  i\/))) 

(•nos  "Bod  novo  —  dirootory  *41  didn’t  and  vlth  •  */*  t**  filonoM))) 

(d«fiB  nea-Kilno*-ia'>dix«otory  (dlxootory-nano  liot-oK-flX«o) 

(it«rot«  for  nano  in  llot-of»filoo 

«h«n  (or  (not  (otringp  non*)) 

(not  (prob«-fiX«  (eenoatonot*  ’otring  dirootozy-nono  non*)))) 
ooUoet  non*)) 

(dofon  oyotan-ootpot  (ooMond) 

:  dofenito  o  olioll  ooMond  writoo  ito  oqtpvt  to  tlio  ** jtmJt  filo** . 

:  iboold  work  if  rin—anl  io  in  oyntox  yon’ d  glv«  vh«n  typing  to  tb*  oholl . 

::  rrobobly  don’t  n*od  to  oloor  *jnnk-fll«*  fizot  booooo*  SYdnM  will  dir*at  dOHETSlva  thozo. 
(oyoton  (oenootonoto  ’otrlng  **(  **  nf  iiiil  "  )  >  "  ajunk-fil**))) 

(dofon  rood-ono-form  (filonono) 

;  rotorno  VZl*  if  f il«  doom’ t  oxiot  or  if  tbo  rood  f oilo 
(lot  ((inotroon  (opon  filonono  ;diroctioo  ; input  :if-doo«-not-oxiat  nil))) 

(ond  inotroon 

(onwind^protoert 

(rood  inotroon  nil  nil) 

(elooo  inotroon))))) 

(dofon  got»oyoton-ootpiit  (oeoaond) 

oooonoa  ootput  io  o  oinglo  lino 
(oyoton-output  oononnd) 

(with-opon-filo  (infilo  *jtink-filo*  idiroqtioo  rixipot) 

(rood-lino  infilo) ) ) 

(dofon  linoo-fron  (filonono) 

dotorno  tho  Hot  of  linoo  (oo  otringo)  fron  tho  givon  filo. 

(with-opon-filo  (otzoon  filonono) 

(itoroto  with  ono  ond  tonp 
whon 

(progn  (ootg  tonp  (rood-lino  otroon  nil  nil)) 

(vbott  (noil  tonp)  (rotom  ono)> 

(not  (ogool  (ootg  tonp  (otring-trln  **  \t'*  t«aop)) 

-"))) 

ooUoet  tonp  into  ono))) 

(dofmi  fomot-to-filo  (filonono  otring  oroot  axgo) 

(vitk-opoa-filo  (oatfilo  filonono 

idirootion  ;otttpot 
:if-dooo-not-oxiot  :erooto 
:if-oxioto  :oow-Toroion) 

(apply  t’fomot  ontfilo  string  oxgo))) 

(dofon  prino-to-ood-of-filo  (filonono  string) 

(witb-opon-filo  (otroon  filonono  :dirootion  :ootpot 
:if-oxi«to  :oppoad 
: if-dooo-BOt-oxiot  :orror) 

(pria«  string  stroon))) 

(dofon  list-dirootory  (dirootory-nono) 

;;;  o****  Wo  sbonld  obongo  this  to  ovoid  tbo  proMon  of  two  ^obs  dooHiag 
;  with  tbo  son*  jonk  filo,  onoo  OZMCTOdY  works  right  in  AXCL. 

;  kotoms  tbo  listing  (os  strings)  of  tbo  givon  dirootozy  nono, 

;;  sortod  fay  tlno,  nost  rooontly  vritton  onoo  boing  ot  tbo  ond. 

(psogn  (qfston-ootpot  (foxnot  nil  "Is  ’-A'"  dizootory-nono) ) 

(linos-fron  *)nnk-filo*) ) ) 

(dofon  oloor-dirootory  (diroctory-nono  ooptionol  quory  ooux  fllos) 

;;  kotoms  T  nloss  wo  docido  to  obort.  ond  tboa  kZL. 

(wbon  (sotg  filos  (list-dirootory  diroctory-nono) ) 

(oood 

( (or  (not  goory) 

(y-or-n-p  "Dizootory  -A  not  onptyf  OK  to  oloor  it?" 
dirootory-nono) ) 

; ; (itoroto  for  filo  in  filos  do 

(doloto-filo  (ooneotonato  ’  string  diroctory-nono  filo) ) } 

(foxnot  t  "-tclooring  dlzoctory  -'A.'-*'*  diroctory-nono) 

(syston  (foxnot  nil  "sn  -A/*"  diroctory-nono))) 

(t  (orror  "Dlzoctory  -A  not  onptyf  9Htting. ..." 
diroctory-nono) ) ) ) ) 

(dofon  job-filo-aono  (diroctory-nono  job) 

(oooeotonoto  ’  string  dirootoxy  nono  (job  ■non*  job) ) ) 

(dofon  ogi^ct-filo-nano  (diroctory-nono  job) 

t ooneotonato  ’string  dirootory-nono  (jeb-nono  job)  ".ootpet")) 
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job) 


(dogii  rtotttO«flXo»wiio  (dlrootoxy-bOM  job) 
loottaotoooto  'otrittg  dlzoctocy^aoM  (job>n 

(dofan  t«i^>flalab-flXo*‘ftaao  (***  *i**y-^ ‘i~  job) 

(onnnifttibo  •atxing  dlroetoxy-BOM  **fintoh.*‘  (priAl-bo-«t«lA9  (job-nobor  job)))) 

(dofbn  tOM^ototoo-filo-na—  (dizootory-naJbo  job) 

ioooootaooto  'otxiag  dlroctory-nido  "ototoo.**  (prinl-bo-otzlAg  (job-awbor  job)))) 

<doAm  ontpnt.-fllo-tiMio  (dlzoatozy-auM  job) 

(ooBoatooato  'otzla^  dirootory-aaM  "output. **  (prlal*-to*‘«tria9  <job-anbor  job)))) 

{dofoB  elMok-look-^oa-por«jobo  (} 

(wbaa  (ptobo-fUo  *look-filo-naao*) 

ioirzor  "AlzMdy  Xookod  by  -'At  »%•>%«• 

iTfTtlatf  bopua  look  Clio*  aroa't  roMoirod  — «>%  « 

Xf  you  bSALLY  waat  to  roMooo  tbo  look  fllo  (CkkWULt  1 1 1 ) ,  •>%  • 
(XmOVI-ZOCK-^-PAk-JCM}  .  ]«%" 

(fllo-author  oioak-fUo-aaM*) ) ) ) 

(dofua  look-oa-poz-jobd  () 

(fozaat-to-£ilo  oiook-filo-naiM^  "loekod'-d") ) 

(dofUa  roAooo'-look'OCk-por-jobo  <) 

(doloto-filo  oiock^filo-auM*) ) 

(dofua  bloekod-booto  () 

:  just  for  tho  usoz,  porksps 

(list'dlxoetozy  (ooaootoaoto  'stclag  *systsa-pozollol-dizoetoty* 
oprotoatod-bosts-subdtro<TtorY* ) ) ) 


OTBPt  lAMDOH  QTXLXTXU 


(dofua  hostasao  () 

(got-systsa-ootput  "bostaoao")} 

(dofOB  osozBsao  () 

(got-systsa-output  "ubooal")) 

(dofua  sot~fllo-soruor-tlas~dispority  () 

(fOCBSt-tO-fllO  *jUBk-filO*  "JOBS") 

(aotg  ofUo-sorvoZ'-tiao-dispority* 

(•  (fiXo^wxito-doto  ojuak-filo*)  (got-uaiuorssl-tiao)})) 

(dofua  got'filO'Soroor^tJjM  () 

(•f  *filo-sozTor*tlao>dispority*  (got>oaltrors«l~tijao))) 

(dofua  oatput*b08t-job  (status  host-job) 

;;  loro  dtATUd  is  a  striag 

(lots  ((ao«  (-  (got-ualTossal-tlao)  •start-tiao’*)) 

(striag  (foraat  ail 

(ooaeatoaato  'striag  status 

"»  «k  :  -k  (job#  -©)-70,OT-'O;-D-D-4") 
(oar  host- job)  (job-aaao  (odr  host' job)) 

(job-Buabor  (odr  hoot- job))  (floor  bow  ICDO) 

(aod  (floor  aov  dOO)  d)  (aod  (floor  aov  dO)  10)))} 
(output-loag-job-iafo  striztg) 

(priao  striag) 

( f oroo-output ) ) ) 


•Torr  ; ; ; ; ; ; ;  ; ; ; ; 


(dofua  diglts-to-striag  (auabor  siso) 

(subsog 

(pvial-to-striag  (4-  (oxpt  10  siso)  auabor)) 
1  (Id  siso))) 
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(dAfon  iiitttall  »•« () 

(Mt5[  *«t4urt~tiJM*  tjft  imlTTml 
ryr-ti— -dlapaLgity) 

(Mt^  (hestBMM) ) 

<Mtq  (QflftXaaM)) 

(••tq  *«biq-TaX««*  (aod  (u-*-**1 — t - 1000000)) 

(Mtq  *Ioo9-job-iAfo-fil«auM*  (oo«M»t«nAt«  'fttring  *«y«t«A'>|kAr4ll*I>diz*otory* 

*ootpa:t-iaf«-«Qbdlc*ctory* 
*tM«x*-n*M*  **.** 

<4iol^«'te'‘«txiAO  *nniq-vmliM*  C))) 

(l«t  ((tMV)) 

(mviad-pcetMt 

(vhaa  (not  (Mtq  <op«o 

:dir*otloa  ; output 

;if-tt3tlato  ail 

:if-dooa-aot-oxi«t  :U£oato))) 

(oxTor  (ooaflotoaato  '■ftxiaig  *laag«job»-iafo-fllonMa>»  **  oxiato*'))) 

(if  toap  (oloM  toiip)))) 

(iflioo  (Bot  (pcobo-filo  *«bort- job-lafo»filoaaao*) } 

(orres  (ooaeatoaato  *«triag  ^ahort-jo)>-ittfo»giloBoao* 

**  dooa  aot  oxiot  **)})) 

(doftet  output-loag-job-iafo  (xtriag) 

(priao'te^ood-ef-filo  *loao>job-info>fil«aaao*  otriag)) 

(dofim  nTitput - ■hnrt - jnh-inf n  (atciag) 

(pxiae-to^oad-of-filo  •ohort-^o^iafo-flloa— *  otxiag)) 

(dofon  jdl^lafo-Moax^o  (aooaogo  hoot-naaoa) 

;;  o.O-  if  aooMpo  ia  ffM-llART  and  hoat-aaMa  ia  (**oli**  “eliaatlS**  "aadasaoa") , 
;;  aad  aadoxaao  ia  oarxaBtly  tha  oaly  blootead  boat,  wo 

4>K02/17/00  X<:2<:S3  fAR-JUAT  kaoAMaa  eliaatl2  X 
;;  boata:  {*'oli'*  "clioatlS**  "aadoraM") 

bloekod:  ("aadoxaoa") 

(aoltiplo-walao-bii^ 

(aoo  aia  hoax  data  aoatb  yaar) 

(gat-daoadad-tiM) 

(fozaat  ail 

"«C>m*D<-D/>D*D/^D«>0  -X  -K  -A  »D**«beata:  »d~%bXoobad? 

(flaex  aontb  10)  (aod  aoatb  10)  ;  JACL  bogiO)  aaJtaa  aa  do  ooatoxtioaa 
(floox  data  10)  (aod  data  10) 

(jood  (floox  yaax  10}  10)  (atod  yaax  10) 

(f^oox  hour  10)  (aod  hoax  10) 

(f.^oox  aia  10)  (aod  aia  10) 

(floox  aaa  10)  (aod  aao  10) 

aaaaaoa  *uaar-aaaa*  *looal-boat-aaaa*  ^uaif-^'ralaa* 

boat-naaaa 

(liat-dixaatoxy  (ooaoataaata  'atxiag  aayataa'paxallal-diracrtorya 
•pxotaatad-boata^aubdiractory*) ) ) ) ) 

(dafOB  output-job-infe  (aaaaaga  hoat-aaaaa) 

(lot  ((aaaa  ( job-iafo'aaaaaga  aaaaaga  boat-oaaaa  )>) 

(output-ahort-job-iafo  aaaa) 

(output-long' job-iafo  aaaa))) 


UADI  OZdPAKBBii  ooos 


;  Tba  fellewlag  foaotioo,  DZCTASCm, 

;;  takaa  input  filaa  froa  ^OM-OZBBCTOaY-XMa 
;  aad  raaa  CCMMAPb-XAMB  oa  aach  of  tbaaa  filaa, 

::  uaiag  tba  boata  ia  ■otTO-rxib-miS 
;  by  aaploylng  a  aobadulax  wbiob  aagagaa  rougbly  araxy  SSZAY  aaoonda, 

;  and  aanda  tba  output  froa  aa^  job  to  tba  dixaotoxy  TUff-DZUCTOaT-XMlB 
;;  (uaiag  tba  aaac  f  11  an  aaaa  aa  tba  input  "joba**  filaa).  Zf  tha  job 
;;  oetg>lataa,  than  tba  output  fila  ia  YUIP-DZMCTORY-XAMI  ia  oopiad 
;;  to  OOTPUT-DIMCTObY-HMB.  Veto  that  OOWUTZC«-rovCTZO«  abould  xatum 
;  aitbax  VZL,  whi^  ladioataa  that  tba  job  diad  ia  tba  aiddla  acatabow, 

;;  ox  alaa  a  paix  —  aitbax  ('auooaaa  .  ^aaaaga>)  ox  (<otbar>  .  ^aaaaga>) , 

;;  wbaxa  ■o>aaaaga>  ia  aitbax  VIL  ox  a  liat  auitabla  for  rGPMAf,  i.a.  a  liat 
;  of  tba  foxn  (atxlag  .  axga) . 

Tba  auxiliary  (aADX)  waxiablaa  any  ba  thought  of  aa  followa. 

;;  BOdTa-dOSd-ALZdT:  An  aaaooiatioo  Hat  with  oaa  pair  (BOOT  ■  CTOb) 
for  aaob  ooourxaooa  of  BOOT  ia  tba  ■oaTl-rZLB-IIMa.  Tba  JOB 
oenponant  ia  aitbax  MIL,  whioh  Indicataa  that  tba  MOOT  ia 
euxraatly  idla,  ex  alaa  ia  a  job  built  uaiag  tba  aaaa  of  tba  input  fila 
ouxxaatly  aaaignad  (via  tba  xab  nn— ml)  to  BOOT. 

;;  JGM-OKAMZQHBD:  Tha  liat  of  iiv«t  filaa  wbiob  hara  not  ynt  boon 
aaaignad  to  a^  boat. 
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(d*fte  Mlu-looal'boat-flsct  (boct-a«M«  looal-hoa^-MJM) 
for  z  la  hoot-n>— 
with  aao 

«boa  (aot  (oqual  x  looaI'>bo«t>atao) ) 
ooUocit  X  late  aao 

flaxUy  (retuxa  (aoeao  (iterate  for  1  froa  1  to  (-  (leagtb  hoot-aeaeo)  (leagth  aaa)) 

oelleot  loeel-boet'aaae) 

vam)))) 

(defOB  aake->hoete-jobe->aliet  (heat'*aaaee  looal-boet-fixet) 

;;  vote  that  *looel-lioet-aaae*  la  already  ialtlalited  by  the  tiae  tbia  la  oalled. 

(eoad 

((not  (oaaap  hoat-niaaa) ) 

(error  "Si^y  Hat  of  hoata")) 

(t 

(whoa  loeal-beat->^firat  (aetq  heat-aaakaa  (aa]Be*-looal-boat-firat  lioat-aaaea  •looal-boat-naae*))) 

(ebea  (aad  *looaX->hoat-«axiiiag* 

(aot  (aaaber  *looal-'boat-naae*  hoat-aaaea  :teat  i'eqoal))) 

(format  t  fbla  rtm  haa  been  aet  vp  atioh  tbat  tbe  local  boat,  «* 

•’Xf  la  VOV^la  the  heat  liat.*’%**  ^local-hoat-aeae*) ) 

(Iterate  for  hoat-aaae  la  heat-aaaea 
«haa  (oood 

((or  (anil  *all-ralld-hoat-aaMa*) 

(aaaber  hoat-aaae  *all-ralld-hoat-naaaa*  :teat  'egoal)) 
t) 

(t  (format  t  "-awiMXM:  Boat  aame  aot  fouad  la  aall-ralld-boat-aamoo* 
hoat-aama) 

all)) 

eolleot  (oeaa  hoat-aama  all))))) 

(defoa  dlapataher  (4key  (joba-direotory-nama  *'joba/**} 

(oatpat-dlreotery-aama  "output/**) 

(hoata-flle-aama  **hoata**) 

(looal-hoat-flrat  t) 

(delay  IS) 

(klH-lf-ae-progreaa  <00) 

(oenmaad-aaae  "po-agtbm**) 

(or^letlea-fuaotloa  t'aqthm-oompleted) 

(freat-ead  (ooaoateaate  'atrlag  *ayat«m-parallal-dlxeotcry* 

**freat-ead.lap**) ) 

(baok-ead  (ooaoateaate  'atrixig  *ayat«m-|»axallel-dlreetory* 

"back-ead.  lap** ) ) 

(aloe-flag  t) 

(job-Uat  aU) 

<aax 

(tamp-dlrectory-aame  **taap/**)  ;auzlllazy  beeaaae  we  uae  it  ia  the  ahall  ocanand 
(tlme-aiaee-laat-progreaa-check  0) 
teiqp-tlme 
(bad-jeba  all) 

aatart-timo*  •local -boat -name*  *uaer>-aame*  •ualq-Talue*  •loag-job-lafo-flXeaamo* 

(*falled-job-aamea*  ail) 

hoata-joba-allat 

joba-xamaalgaed 

( f tni abed-aormal ly  all)) 

;;  set  •start-time*,  •local-boat -aame*,  •user-name*,  •uniq-walao*,  aad  •loog-^ob-lafo-flleaame* 

;;  bora:  Moat  do  tbeae  early  for  the  OOTFOT-JOB-XXFO  la  the  uawlad-proteot 
(format  t  *'-tIaitlaliriag  job  lafozmatloa. . . ") 

(Inltlallse-job-lafo) 

(format  t  *•  doao.-«**) 

(unwlad-protect 

(proga 


(cbeok-look-oo-per-j^a) 

(lock-oa-per-joba) 

(ehk-flle-or-dlseetory-exlata  joba-dlreetory-aame  t) 
(cbk-flle-or-dlxeotory-exlata  eutput-dlreotory-aame  t)  * 
(chk-file-er-dlreotory-exlata  twy  iHreotory-aame  t) 

(eloar-direetory  oatpot-dlroetory-aame  •clear-query-flg*) 
(clear-directory  ta^-dlreotory-aame  *elear-query-flg*) 
(terprl) 
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<«lMA 

(X«t 

(&oo*fil*«-in-dirMtoxy  job«>dlx*atoxy-n4M  job-list))) 

(nbon  bod-JFlloo 

(orcor  *^d  Job  »lXo  Komo  trooldod : -t-X"  b*4-£iloo) ) ) ) 

;  bot  op  loeal  non-spooiol  Tarioblos 
(wbob  aloo-flog 

(ootq  OOP— nd-noMO  (ooaoat«ix*t«  'Jtriiij  “nioo  -  fin— tni-nojio))) 

(sot^  jobo-oaosslgnod 
(if  job-Ust 
job-li«t 

(Ust-dirootozy  jobo-dlroototy-n— )>) 

(sotq  bosto-jobo-«ll«t  ;us«s  *loeol-lioat-— * 

(— JM-bo«to-jebo-olist 
(lla*o-£r—  boots-fUo-— ) 

Xooal-be«t-fir«t) ) 

(fox— t  t  '*'»ftBoot«  roqoootod: 

(itoxoto  fox  hoot-job  la  hooto-j^o-oUot 
oollocit  (oox  host- job) ) ) 

(fox— t  t  **»«losts  oorroatly  blookod: 

(lot  ((X  (rsaoTS  *  local -host-nwo*  (blochod-hosts)  :tost  f'oqtial))) 

(or  X  *aoao) ) ) 

(oatpot-job-lafo 

(itoxato  for  host-job  in  hosts-jobs-alist 
coUoot  (ear  host- job)  )  ) 

(loop 

;;  tho  first  foas  is  ignored  for  tl— -since- last-progress-cbeok,  bat  that's  OK. 

(let  ((toep  (gpdate-onfleted-jobs-reoords 

o—pletion- fanction  hosts-jobs-alist  jobs-aaassigned 
t—p  fliisfjtnry-BS—  ootpat-direotory-na— } ) ) 

(setg  hosts- jobs-alist  (car  tsap) ) 

(setq  jobs-tmassigned  (odr  tssip))) 

(shsA  (aad  *hill-dispatoher-qpoa-seeing-failare*  •failed- job-aa—s*) 

(fox— t  t  M-^-t********  KXLLXhd  snxsui  JOB  —  if  yea  don't  like  abortion, 

(afg  •KfXXr-OXdVATCBBh-OVOir-CBSZVg-rAXLDIUI*  VXL).") 

(retam  nil)) 

(whoa  (>  ti—-siace-last-pr ogress-check  kiU-if-ao-pregress) 

(sotg  tl— -siaoe-last-pxogress-check  0} 

(when  (sotg  bsd-jobs  (find-bcabed-jcbs  hosts- jobs-aulist 

kill-if-no-progxoss  t—p-direetory-na— ) ) 
(kill-processes  bad-jobs  teap-directory-aa— ) 

(setg  hosts-jobs-alist 

(re— ▼e-processes-frca-hosts- jobs-alist 
hosts-jobs-alist  bsd-jobs)) 

(setq  jobs-a— ssigned 

(append  jobs-uaassigned 

(itoxato  for  job  in  bad-jobs 

ooUeot  ( job-na—  job) ) ) ) ) ) 

(setq  tai^tl—  (get-nniTorsal-ti— ) ) 

(lot  ( (toap  (assign-jobs  hosts- jobs-alist  jobs-nnaasigned  eon— nd-aa— 
jobs-directory-na—  fr— t-ead  back-end))) 

(setq  hosts-jobs-alist  (car  tonp)) 

(setq  jobs-n— ssigned  (odr  tonp)) 

(if  (all-jobs-cenpleted  hosts-jobs-alist) 

(retnm  t) 

(sleep  delay))) 


(sotq  tl— -slDoo-last-progross-check 
:  end  of  loop 


(t  tl— -since-last-progreos-<fteek 

<-  (get-nniwersal-tlne)  taaip-ti— ) ) ) ) 


(setq  finished-nox— Uy  t) 


(oond  Cf ailed- job-nanes* 

(for— t  t  "-a-'tPAXLKD  JOM  ->•-4'*  •failed-job-na— s*) 
nil) 

( j  obs -n—s  s  igned 

(for— t  t  LUT  TO  U  OOMS  -  WO  BOdTS  AVXILABLB1  t") 

(fox— t  t  ”**4-ajobs  loft  to  do:-%  jobs-n— ssignod) 

nil) 

<t  t))) 

(re— -look-on-par- jobs) 

(ontpvt-j^info  "tAk-BWD'* 

(iterate  for  host-job  in  hosts-jobs-alist 
collect  (car  host-job))) 

(when  (not  finished-nox— Uy) 

(fox— t  t  "-aileir  any  out  standing  «>-par  jobs.*%**) 

(syst—  »perkill-or— snit*) ) ) ) 
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(dAfoB  CiBd-lMBb«d-job«  (hoata-'jobc-Allvt  klXl-K-BO-progrMS  t«i^diraotory-&«M) 

;  rmtnns  m  Il«t  of  oil  ^otoo  *■*«*"**  boTO  Bodo  bo  progrooo  oiooo  **kill~tlBo" 

(lot  ((klll-tlBO  (-  (got-fUo-ooxvor-tlM)  kiU-lt-BO-progroos) ) ) 

(Itocoto  for  hoot-job  iB  hooto-jobo-oliot 
oboB  (oad 

(odx  hoot-job) 

(lot  ((toB^oatpQt-fllo  (triff  outpot-filo-oMio  ttB^dirootocy-BOBo  (edr  host- job)))) 
(or 

(aot  (brobo-fllo  toap-ootpot-filo) ) 

(<  (mo-«rlt«-4oto  toi^^oatpat-fllo)  hlU-tlJoo) ) ) ) 
ooUoot  (odr  boot- job) ) ) ) 

(doAa  kill  (pld) 

(oyotoB  (ooBootoBoto  *otrla4  "hill  -XZLL  "  (priAl-to-otriag  pld)))) 

(dofoB  9ot-pld  (job  twm  dlrootory) 

;  pld  lo  flrot  Ubo  of  tOBp  output  film 

(Itoroto  for  1  frcB  1  to  2  ;oould  bo  grootor  thoB  2  if  «o  ooBt  to  try  moro  eftoa 

with  OBO 
UBtll  OBO 
do 

(ooad 

((ootq  OBO  (rood-oBO-foxB  (tMp  fflwtpnt.-filo-tioBO  t«Bp-dlrootoxy  job))) 

(rotUXB  OBO)) 

(t  (foiBot  t  "^-OTrylBg  o^oIb  to  rood  pld  of  -d-4**  job) 

(oloop  2))) 

flBoUy  (orror  "Obohlo  to  pot  pld  of  -d"  job))) 

(dofUB  kiU-prooooooo  (bod-jobo  toop-dlrootory) 

(Itoroto  for  job  1b  bod-jebo 

do  (kill  (got -pld  job  illrafTtnry) ) ) ) 

(dofoB  roBooo-prooooooo-froB-booto-jobo-oUot  (booto-jobo-oUot  bod-jobo) 

;  Moto  thot  bod-jobo  lo  la  oobo  ordor  oo  booto- jobo-ollot 

(Itoroto  for  boot-job  la  booto-jobo-ollot 
ooUoot 

(OQBd 

((buU  bod-jObo)  boot- job) 

((oad 

(odr  hoot-job) 

(og  (job-Btad»or  (oor  bod-jobo))  (job-ataibor  (odr  boot- job)))) 

(ootg  bod-jobo  (odr  bod-jobo)) 

(output-boot- job  KXILBO"  boot-job) 

(OOBO  (oor  boot- job)  idl)) 

(t  hoot-job)))) 

(dofua  blooh-opo«m  (boot) 

(OBd 

(BOt  (oguol  boot  *loool-boot-BOBO*)) 

(probo-fllo  (ooDOOtoBoto  'otrlng  *oyotOB-porollol-dirootory* 

*protootod-booto-oubdlrootory*  boot) ) ) ) 

(doftm  ooolgD-jebo 

(booto-jobo-ollot  jobo-UDOooigBod  oeoBOBd-noBO  jobo-dlrootozy-BOBO 
froBt-OBd  book-oBd) 

Thlo  fUBotioB  ooouBoo  tbot  updotinp  uf  OOdrd-^OBf-AUdT  (to  rofloot  ocaplotloa 
;;  of  jobo)  boo  olroody  booa  porforaod,  Tbot  lo:  It'o  aot  tblo  fuDotloa'o  job 
; ;  to  do  tbot  ^^ting .  At  tblo  polat  «o  olroody  kaov  tbot  If  o  job  aoy  bo 
;  ooolgaod  to  tbo  GAR  of  tho  poir  thoa  tho  CDR  of  tbo  poir  lo  VXL,  oad  rleo-Toroo . 

Tblo  zotuTBO  (OOn  <Bo«  booto- jobo-ollot>  <BO«-jobo-uaooolgBod>)  .  zto  oido 
;  offoct  io  to  oool^  BOV  jobo  (froB  ^OBi-PBAMZanD)  to  tbooo  boot* 

;  ublA  oro  ourrotttly  idlo,  oddltloaolly  Ibotruatiag  tbo 
;  booto  to  orooto  . f laloh  f Iloo  ^oa  onfilotloa . 

(OOBO  (Itoroto  fox  poir  la  booto-jobo-ollot 
vltb  aoxt-job 
ooUoet 

(OOBd 

((BOU  jobO-UBOOOl^Md)  poir) 

((OBd  (Ball  (odr  poir)) 

(aot  (blook-opoim  (oor  poir)))) 

(ootg  Boxt-job  (aoko-j^  :bobo  (oor  jobo-UBOisigaod) 

.'Bwbor  (ootg  •oorroat-job-uBlgoo-BVibor* 

(1-f  •ourxoat- job-ualquo-BUBbor*)) ) ) 

(ootg  jobo-UBoaolgnod  (odr  jobo-uaooolgBod) ) 

(•yot«ofe  (ayot«B-job-oooOMad  (oor  pair)  aoxt-job  oowund-naao 

jobo -dir octory-B»ao  fxMat-obd  boch-oad) ) 

(output-hoot-job  "otortlag"  (ootq  poir  (ooao  (oor  poir)  aoxt-job))) 

P*l*) 

<t  poir))) 

jobO-UDOOOlgDOd) ) 
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(d*Am  pjitM  Ji>li  HI— ■nrt  Mw-^ob  — ^ — -*  j4^>«-dlx*otory-ikMM  froat-«od  hftak-«A.d) 

;;  Orot«s^M,  tet  it  worlu.  Sotio*  th*t  \  4pp«*r«  «•  \\  iACid*  qaot«a  (*■  —  **),  %Bd 
;;  thm  ciagl*  qiiot««  *r«  th«r«  to  protoot  a>—  vltb  **fttnay  ofeorootoro"  liko  |  trcm  thm 
:  aholl  osMsaad  proooooor . 

(fomot  ail  "/loool/bla/porooh -X  \V*-X» \V  V\"-X'\V  \\"-X'\\'  U"-X'\\'  \V"-X'\\'  \\''-X'\\'  «- 
(oenoatoBoto  '  string  •systsB-paxaUsl-dirootosy*  **VAR**) 
bost-asao 


(job-Bosboc  Bow-job) 
froat-oad 

(job-filo-asao  jobo-dlrootory-nsao  aow-job) 
bsoX-oad)) 

(dofoa  job  onaplatoil  (job  tMir  ilirsntnrj  nsas) 

(prebo-filo  (tsap-finish-tllo-asao  tMji  illrsntnrj  iiTin  job))) 

(dofOB  aoTO-ts^p-gilos-to-OQtpat  (job  t  sap  iHisntnry  outpat-dirofttory-naas) 

;  Moros  th*  ootpot  sad  ststas  yilos  from  tbo  tsi^  dirootory  to  tbo  output  dixsotosy . 
;  It  tbo  flog  *oovo-t«ap~-filoo>flg*  is  noa-ail  tboa  tbis  doos  o  oopy  iastood  of  o 
:  aoTO  (for  dobogging  oaly,  probably)  . 

(systooi  (focaot  ail 

(if  *asTO~t«sp-filos-flg* 

“op  '-X*  '-X'  ;  op  '-X'  '-X'“ 

**mw  »«X»  *-X»  ;  BT  *-X»  '-X'**) 

(tixi  iiut^iiit  f  11  i  mas  toap-dirootory~asa s  job) 

(output- filo-aoas  output-dirootory-aoao  job) 

<tosy-stottts-fils-a>as  tosy-diroctory-asao  job) 

(stotus-fllo-aoao  outpot-dirootosy-aoao  job)))} 

(dofua  updoto-ooaplotod- jobs-zooosds 

(oflaplotioa-fuaotioa  boots- jobs-slist  jobs-uasssigaod  taup  diroctory-asas 
output-dizoetory-aoao  4sax  suoooss-ststas) 

;  XotuzBS  ao«  ▼ozsioas  of  ■OiTd-JOU-Al.XfT  sad  JOad-OXAddldllXb  to  rofloot 
;;  kaowa  job  oeaplotioas  (sad  foiluzss)  . 


(itozsto  for  pair  ia  bosts-jobs-slist 
ooUoot 


(if  (odz  pair) 

(ooad 

( ( job-oo^lotod  (odr  pair)  tf  illrsntnry  mas) 

(if  (sotg  suoosss-statas 

(fuBoall  oo^lotioa-fbaortioa 

(tswp-itstus-filo-Bsaa 
toap-dirootozy-Bsaa  (odz  psiz)))) 

(proga 

(output-bost-job  “ooMplatod**  pair) 

(wban  (aot  (sg  (oar  suoooss-ststas)  'suoooss)) 

(sstg  *fsil#d-job-asass* 

(ooas  ( job-nsas  (odz  pair))  *fsilad- job-asaas*) ) ) 
(whoa  (odr  suoeoss-ststus) 

(fzasb-lina) 

(fosaat  t  “  (Job  f-'b)  **  (job-auabor  (odz  pair))) 

(apply  t'fozast  t  (odz  suooass-ststus) ) 

(fzasb-liaa) ) 

(aoTO-tsap-filas-to-output  (odr  pair)  taa^-diroctory-naaa 
output-dizoctozy-naao ) ) 


(proga 

(output-bost-job  “*«*VOT  oeoplotod"  psiz) 

(sotg  jobo-QBsssigaod  (sppaad  jobs-uassaignod 

(list  (job-asao  (odz 


< 


(oar  pair)  ail) ) 


»»ir)}))))) 


(t  pair)) 

P^)) 
jobs-uaaaaignod) ) 

(dofba  all-jobs-ooaqplotod  (bosts-jobs-alist) 
(itazata  for  x  in  bosts-jobs-aliat 
always  (aall  (odz  x) } ) ) 
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:  top . lop 


;;  Moot  of  thlo  filo  io  pilforod  (ond  aodifiod)  from  on  sol  fllo,  ohlob 
ia  tocn  bocxowo  ftooly  ft«a  Boyor  and  Mooco'o  filo  a^thn.liop. 

(dofoa  nqttan-loodod  <) 

<fbooadp  'itoroto)) 

(whoa  (net  (oqUai-loodod) ) 

(load  "/oar/local/aro/pasallol/feoa-nqtha**) ) 

(dofyar  di«pat<dkoS'Oodo»-filo« 

'  (**/looal/arc/parallal/dlopatoh**  **/looal/asa/pasallol/fca") ) 

(dofon  alroady-oo. ^ilodp  (fllanaao  «aux 

(liop-filanaao  (oonoatMkato  '  atslng  filanaaa  ** .  lap"  )) 

(bla-filanaM  (oonoatonato  *otzlag  filanaaa  ".o"))) 

(and  (psoho-filo  liap- filanaaa) 

(pseba^fila  bia-filonaaa) 

(<  (filo-writa-dato  liap- filanaaa)  (fila-«rita-da^  hln- filanaaa) )) ) 

(dafhn  fizat-fila-to-oaapila  (filanaaaa) 

(itazaho  for  naaa  ia  filanaaaa 

whoa  (not  (alroady-ooiyilodp  naaa) ) 
do  (sattizn  naaa))) 

(DSruV  OOMfXIJI-diapatahoz  (tans  (fizat-fila-to-ooapila  (fizat-fila-to-oonpila  diapat^ar-ooda-filaa) ) ) 
(if  (noil  fizat-fUo-to-ocavila) 

(fezaat  t  "•AAll  diapatohoz  filaa  aza  alraady  oeapilod.«*4") 

(FUIT  ((IX  (M) 

(tOKP  (oonoatanata  'atsing  n  **.0"))) 

(C»  (V) 

(0CMB2LB-VZXJI  (oonoatanata  'atzing  n  ".lap")))) 

;  oonld  bawa  FhOCZAZM  foza  haza,  aa  ia  nqthn 
(itarata  for  fila  ia  diapatohar-ooda-filaa 
with  ooayila-flff 
da  (ooad  (oo^pila-flg 

(cr  fila)  (ZX  fUa)) 

((aqoalp  fila  firat-fila-to-oo^^ila) 

(aatq  oonpila-flg  t) 

(Cr  fila)  (IX  fila)) 

(t  (IX  fila))))))) 

Xnwoklag  (load-diapatotaaz)  ia  all  it  takaa  to  bnild  a  rvmnabla  wazaion  of 
thia  ayatan*  aaotming  that  yon  bawa  oaapilad  it. 

(dafnn  load-diapatdbar  («anz  badfila) 

(whan  (aatg  hadfila  (firat-fila-to-oeapila  diapatehar-eoda'filaa) ) 

(fozBot  t  "naiwzjro:  Tba  fila  abould  ba  ocnpilad." 
badfila) ) 

(niT  ((IX  (IT) 

(liOAD  (oonoatanata  'atzing  n  "<0")))) 

(itarata  for  fila  in  diapatobar-ooda-filaa 
da  (IX  fila)))) 
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;;  /looal/«sci/pftC4ll*X/froikt-ttnd'>«itli~doo.l«p 


<««t4  *br«ttk-«oabiX«*  ail) 
ays : :  *aotify''Qbo*  t) 

;  to  ba  iapot  for  portioolox  ^6bi 

(dofrar  *otaxt>po4iitioD*)  ;po«itieA  of  opplioablo  aoto-Llb  or  boot-strap 

(dofrar  *  start -aamo*)  ;nsaa  of  first  osaat  to  bo  poaalbly  proood 

(dofsaz  *finiah-naao*)  ;iiaM  of  oroat  obloh  toralaatos  tho  job  (aa^o  MIL,  fox  Bor) 


;  Tho  foUotflag  might  bo  aodif iod  ia  tho  partioolar  iapot  fUo . 

(dofrar  *par-dirootory-aamo*  “/aar/hoM/par/**) 

(dofrar  *oroDt-iadoz*  0)  ;«bi^  oroat  wo'ro  oorroatly  looJciag  at 

(dofrar  *axiom-stago*  t)  ;ifhoa  t,  «o  shoold  roplaoo  PROVB-UMOs  by  JUDO-AXZGMs 

(dofrar  *last-oroat-aaao*)  ;zkaM  of  oroat  »ost  roooatly  road  from  tho  input  stroaa 

; (sloop  30) 

(dofoB  do-oroat  (fora) 

(sotq  *oroat-ladox*  (1^  *oroQt-iBdoz*) ) 

(sotq  *laat-oroat-Baao*  (oadr  fosa)) 

(oond 

((<  *oroat-iBdojc*  *start>pesitioa*) 
t) 

(t 

(whoa  *axloB-stago« 

(ooad 

( (oq  (eadr  fora)  Astart-asaa*) 

(sotq  *axlfla-stago*  ail)) 

((oq  (oar  fora)  'proro-lssaa) 

(sotq  fora 

* (add-axioa 

,  (oadr  form)  ,  (oaddr  fora)  ,  (oadddr  fora)))) 

<(oq  (oar  fora)  'loams) 

(sotq  fora 

'  (axiom 

, (oadr  fora)  , (oaddr  fora)  ,  (oadddr  fora) ) } ) ) ) 

(ppr  fora  ail) 

(torpri  all) 

(oral  fora)))) 

(dofua  format!  (strlag  trost  args) 

;  aotico  that  tho  call  to  o<^2  gnaraatoos  that  wo' 11  ororwrito  tho  arr 
(systoa 

(coneatoaato  ' string 

"ocho2  ' " 

(apply  i' format  ail  string  args) 

“"‘))) 

(dofua  foraat-aqtha-statos  (strlag  arost  args) 

(apply  t' format!  (ooaoatoaato  'string  *output-aaaplotod-striag*  string)  args)) 

(dofua  par-nqtha-top-lorol  («aux  naxt-par-foxa  iait  suoooss) 

(uaviad-protoot 

(loop 

(sotq  iait  nil) 

(sotq  oaxt-par-fora  (road  *  standard- input*  nil  a-rory-raro-oons) ) 

(sotq  iait  t) 

(if  (or  (oq  aoxt-par-fora  a-rory-raro-ooas) 

(and  afinish-naao*  (oq  (oadr  aoxt-par-fora)  *fiaish-nsao*) ) ) 

(rotora  (sotq  sooooss  t)) 

(or  (progl  (print  (do-oront  aoxt -par-fora) ) 

(torpri  ail)  (torpri  ail}) 

(rotum  (sotq  suoooss  nil)))}) 

(ooad 

( (nail  iait) 

(fosaat-nqtba-statns  "railuro:  Vasuoaossful  road  aftor  -'•**  (oadr  noxt -par- fora) ) ) 

(suoooss 

( format -nqthm- status  "•uooosst  I**)  ) 

(t  (format -nqthm- status  "  PAILORB:  Ybo  owont  failod.** 

*last-oTont-n  mao  * ) ) ) 

(byo  (if  suoooss  0  1))/) 


;;  iMt 


it/blB/o«h 

i  p*r  [oo— and-naaa J  Cialqa«_aiab«r]  [fUail 

I  aaiids  prooass  maihar  of  Um  MA  oall  followed  by  «tABd*rd  output  of  rsb,  all  to  tho  ataodard  output 
oeho  M  >  tawp/outpiit .  #3 

<eat  |4  |S  fd  I  rab  $1  |2  *:  ooho2  datatua'  »  ta^/outpot .  $3)  >a  taap/atatua .  |3 
0^10  ••  ••  >  taMp/finlab.93 
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